Web Scraping with Python

About This Course

In this course, you will learn about web scraping using Python. We start with a brief introduction and illustrate how to set up your software environment. We then step through the main pillars that make up the web and pages and discuss how to deal with them using Python. First, we talk about HTTP and how you can use the Requests library in Python. Next, we discuss how HTML can be parsed using Beautiful Soup, before returning to HTTP to discuss advanced concepts. We then move on to JavaScript and the Selenium package. The course is concluded with a broad number of closing topics, where we take a look at other packages, how web scraping relates to other domains such as data science, and discuss some managerial aspects and legal concerns of web scraping. The course is concluded with an overview of best practices.

The course provides a sound mix of both theoretical and technical insights, as well as practical implementation details. These are illustrated by several real-life hands-on examples A Python tutorial is also provided.

The course features more than 2 hours of video lectures and various multiple choice questions. A certificate signed by the instructors is provided upon successful completion.

See this video to get a free teaser of the course contents.

We can also come and teach this course on-site in classroom format. If interested, please mail us at: Bart@BlueCourses.com.

Price

The enrollment fee for this course is EUR 250 (VAT excl.) per participant. Payments are securely handled by PayPal. If you are a company in the European Union, then we can apply VAT reverse charge. For this, please mail your VAT number to Bart@BlueCourses.com. Part of our course revenue is used towards funding organizations involvement in protecting and cleaning our oceans. See our about page to learn more about our mission statement.

Course Outline

Chapter 1: Introduction

About
What is web scraping?
Which use cases does web scraping enable?
Some introductory examples
Setting up

Chapter 2: HTTP with Python and Requests

What happens in a web browser
☞ What goes on behind the browser
The HyperText Transfer Protocol
☞ Talking HTTP with Python
Python HTTP libraries
☞ Requests hands-on
Using Requests
☞ Getting the weather with Requests
Quiz

Chapter 3: Parsing HTML and CSS with Beautiful Soup

Parsing HTML
Using Beautiful Soup
☞ Beautiful Soup hands-on
Cascading Style Sheets
CSS selectors in Beautiful Soup
☞ Further Beautiful Soup examples
Quiz

Chapter 4: Delving Deeper in HTTP

Forms and POST data
POST requests with Requests
☞ POST requests hands-on
Other HTTP methods
HTTP headers
Cookies
☞ Cookies hands-on
Cookies in Requests
Sessions in Requests
Other content: binary files, JSON
☞ JSON hands-on
Quiz

Chapter 5: Dealing with JavaScript with Selenium

JavaScript
Selenium
☞ Selenium examples
Quiz

Chapter 6: Closing topics

Other Python libraries and tools
Other programming languages
Command line tools
News articles
Commercial products
Web scraping vs. web crawling
Web scraping vs. AI and ML
Web scraping vs. RPA
<blank> scraping
Legal concerns
Web scraping as part of data science
The cat and mouse game
Closing best practices
Quiz

Course Staff

Prof. dr. Seppe vanden Broucke

Seppe vanden Broucke is an assistant professor at the department of Business Informatics at UGent (Belgium) and is a lecturer at KU Leuven (Belgium). His research interests include business data mining and analytics, machine learning, process management and process mining. His work has been published in well-known international journals and presented at top conferences. He is also author of the books Beginning Java Programming (Wiley, 2015) and Principles of Database Management (Cambridge University Press, 2018). Seppe's teaching includes Advanced Analytics, Big Data and Information Management courses. He also frequently teaches for industry and business audiences. See seppe.net for further details.

bluecourses: BC18
Web Scraping with Python

Web Scraping with Python
Enroll

About This Course

Price

Requirements

Course Outline

Course Staff

Prof. dr. Seppe vanden Broucke