Skip to main content

Web Scraping with Python

In this course, you learn about web scraping using the Requests, Beautiful Soup and Selenium packages in Python.

About This Course

In this course, you will learn about web scraping using Python. We start with a brief introduction and illustrate how to set up your software environment. We then step through the main pillars that make up the web and pages and discuss how to deal with them using Python. First, we talk about HTTP and how you can use the Requests library in Python. Next, we discuss how HTML can be parsed using Beautiful Soup, before returning to HTTP to discuss advanced concepts. We then move on to JavaScript and the Selenium package. The course is concluded with a broad number of closing topics, where we take a look at other packages, how web scraping relates to other domains such as data science, and discuss some managerial aspects and legal concerns of web scraping. The course is concluded with an overview of best practices.

The course provides a sound mix of both theoretical and technical insights, as well as practical implementation details. These are illustrated by several real-life hands-on examples A Python tutorial is also provided.

The course features more than 2 hours of video lectures and various multiple choice questions. A certificate signed by the instructors is provided upon successful completion.

See this video to get a free teaser of the course contents.

We can also come and teach this course on-site in classroom format. If interested, please mail us at: Bart@BlueCourses.com.

Price

The enrollment fee for this course is EUR 250 (VAT excl.) per participant. Payments are securely handled by PayPal. If you are a company in the European Union, then we can apply VAT reverse charge. For this, please mail your VAT number to Bart@BlueCourses.com. Part of our course revenue is used towards funding organizations involvement in protecting and cleaning our oceans. See our about page to learn more about our mission statement.

Requirements

Before subscribing to this course, you should have a basic understanding of the web. Especially some familiarity with HTML will come in helpful.

Course Outline

  • Chapter 1: Introduction
    • About
    • What is web scraping?
    • Which use cases does web scraping enable?
    • Some introductory examples
    • Setting up
  • Chapter 2: HTTP with Python and Requests
    • What happens in a web browser
    • ☞ What goes on behind the browser
    • The HyperText Transfer Protocol
    • ☞ Talking HTTP with Python
    • Python HTTP libraries
    • ☞ Requests hands-on
    • Using Requests
    • ☞ Getting the weather with Requests
    • Quiz
  • Chapter 3: Parsing HTML and CSS with Beautiful Soup
    • Parsing HTML
    • Using Beautiful Soup
    • ☞ Beautiful Soup hands-on
    • Cascading Style Sheets
    • CSS selectors in Beautiful Soup
    • ☞ Further Beautiful Soup examples
    • Quiz
  • Chapter 4: Delving Deeper in HTTP
    • Forms and POST data
    • POST requests with Requests
    • ☞ POST requests hands-on
    • Other HTTP methods
    • HTTP headers
    • Cookies
    • ☞ Cookies hands-on
    • Cookies in Requests
    • Sessions in Requests
    • Other content: binary files, JSON
    • ☞ JSON hands-on
    • Quiz
  • Chapter 5: Dealing with JavaScript with Selenium
    • JavaScript
    • Selenium
    • ☞ Selenium examples
    • Quiz
  • Chapter 6: Closing topics
    • Other Python libraries and tools
    • Other programming languages
    • Command line tools
    • News articles
    • Commercial products
    • Web scraping vs. web crawling
    • Web scraping vs. AI and ML
    • Web scraping vs. RPA
    • <blank> scraping
    • Legal concerns
    • Web scraping as part of data science
    • The cat and mouse game
    • Closing best practices
    • Quiz

Course Staff

Seppe vanden Broucke

Prof. dr. Seppe vanden Broucke

Seppe vanden Broucke is an assistant professor at the department of Business Informatics at UGent (Belgium) and is a lecturer at KU Leuven (Belgium). His research interests include business data mining and analytics, machine learning, process management and process mining. His work has been published in well-known international journals and presented at top conferences. He is also author of the books Beginning Java Programming (Wiley, 2015) and Principles of Database Management (Cambridge University Press, 2018). Seppe's teaching includes Advanced Analytics, Big Data and Information Management courses. He also frequently teaches for industry and business audiences. See seppe.net for further details.

Enroll