Machine Learning Essentials

About This Course

In this course, participants learn the essentials of Machine Learning. We start with an introduction to machine learning and its applications. We then discuss data preprocessing and feature engineering. Both are essential steps to build high-performing machine learning models. This is followed by introducing the basic concepts of regression and classification. We then discuss how to measure the performance of predictive analytics techniques. Next, we zoom in on association rules, sequence rules and clustering. We then elaborate on advanced machine learning techniques such as neural networks, support vector machines and ensemble models. We also review Bayesian networks as probabilistic white box machine learning models. A next section reviews variable selection. We extensively discuss machine learning model interpretation and deployment. The course concludes by highlighting some machine learning pitfalls. The course provides a sound mix of both theoretical and technical insights, as well as practical implementation details. These are illustrated by several real-life case studies and examples. The course also features code examples in both R and Python. Throughout the course, the instructors also extensively report upon their research and industry experience.

The course features more than 8 hours of video lectures, multiple multiple choice questions, and various references to background literature. A certificate signed by the instructors is provided upon successful completion.

See this sample lecture video on YouTube to get a free teaser of the course contents.

We can also come and teach this course on-site in classroom format . If interested, please mail us at: Bart@BlueCourses.com.

FREE Background Reading Material

The course comes with the following FREE background reading:

Price

The enrollment fee for this course is EUR 250 (VAT excl.) per participant. Payments are securely handled by PayPal. If you are a company in the European Union, then we can apply VAT reverse charge. For this, please mail your VAT number to Bart@BlueCourses.com. Part of our course revenue is used towards funding organizations involvement in protecting and cleaning our oceans. See our about page to learn more about our mission statement.

After enrollment, participants will get 1 year unlimited access to all course material (videos, R/Python scripts, quizzes and certificate).

Requirements

Before subscribing to this course, you should have a basic understanding of descriptive statistics (e.g., mean, median, standard deviation, histograms, scatter plots, etc.) and inference (e.g., confidence intervals, hypothesis testing). Previous R and Python experience is helpful but not necessary.

Course Outline

Introduction
- Instructor team
- Our Machine Learning Publications
- Software
- R/Python tutorials
- Data sets
- Disclaimer
Introduction to Machine Learning
- Machine Learning
- Machine Learning Examples
- Machine Learning Process Model
- Types of Machine Learning
- Quiz
Data Preprocessing
- Motivation
- Types of data
- Types of variables
- Denormalizing data
- Sampling
  - Sampling in R
  - Sampling in Python
- Visual data exploration
  - Visual data exploration in R
  - Visual data exploration in Python
- Descriptive statistics
- Missing values
  - Missing values in R
  - Missing values in Python
- Outliers
  - Outliers in R
  - Outliers in Python
- Categorization
  - Categorization in R
  - Categorization in Python
- WOE and IV
  - WOE and IV in R
  - WOE and IV in Python
- Quiz
Feature Engineering
- Feature Engineering Defined
- RFM features
- Trend features
- Logarithmic transformation
- Power transformation
- Box-Cox transformation
  - Box-Cox transformation in R
  - Box-Cox transformation in Python
- Yeo-Johnson transformation
- Performance Optimization
  - Performance Optimisation with Yeo Johnson transformation in R
  - Performance Optimisation with Yeo Johnson transformation in Python
- Principal Component Analysis
- t-SNE
- Quiz
Regression
- Linear Regression
  - Linear Regression in R
  - Linear Regression in Python
- High Dimensional Data
- Ridge Regression
  - Ridge Regression in R
  - Ridge Regression in Python
- LASSO Regression
  - LASSO Regression in R
  - LASSO Regression in Python
- Elastic Net
  - Elastic net in R
  - Elastic net in Python
- Principal Component Regression
- Partial Least Squares (PLS) regression
- Generalized Linear Models (GLMs)
- Generalized Additive Models (GAMs)
Classification
- Linear Regression
- Logistic Regression
  - Logistic Regression in R
  - Logistic Regression in Python
- Nomograms
  - Nomograms in R
- Decision trees
  - Decision trees in R
  - Decision trees in Python
- K-nearest neighbor
  - K-nearest neighbor in R
  - K-nearest neighbor in Python
- Multiclass classification
- One versus One coding
- One versus All coding coding
- Multiclass decision trees
Measuring the performance of predictive analytics techniques
- Performance measurement
- Split sample method
- Cross-validation
- Single sample method
- Performance measures for classification
- Confusion matrix (classification accuracy, classification error, sensitivity, specificity)
- ROC curve and area under ROC curve
  - ROC curve in R
  - ROC curve in Python
- CAP curve and Accuracy Ratio
- Lift curve
- Kolmogorov-Smirnov distance
- Mahalanobis distance
- Performance measures for regression
- Quiz
Association and Sequence Rules
- Association Rules
- Support and Confidence
- Association rule mining
  - Association rule mining in R
  - Association rule mining in Python
- Lift
- Association rule extensions
- Post-Processing Association Rules
- Association rules applications
- Sequence rules
- Quiz
Clustering Techiques
- Hierharchical clustering
  - Hierarchical clustering in R
  - Hierarchical clustering in Python
- K-means clustering
  - K-means clustering in R
  - K-means clustering in Python
- DBSCAN
  - DBSCAN in R
  - DBSCAN in Python
- Evaluating clustering solutions
- Quiz
Neural Networks
- Neural Networks
  - Neural Networks in R
  - Neural Networks in Python
- Deep Learning Neural Networks
- Opening Neural Network Black Box
- Variable Selection
- Rule Extraction
- Decompositional Rule Extraction
- Pedagogical Rule Extraction
- Quality of Extracted Rule Set
- Rule Extraction Example
- Two-Stage Model
- Self-Organizing Maps
  - SOMs in R
- Self-Organizing Maps Example
- Self-Organizing Maps Evaluated
- Quiz
Support Vector Machines (SVMs)
- Problems with neural networks
- Linear programming
- Linear Separable case
- Linear non-separable case
- Non linear SVM classifier
  - RBF SVM in R
  - RBF SVM in Python
- Kernel functions
- Neural Network Interpretation of SVM classifier
- Tuning the hyperparameters
  - Tuning the hyperparameters of an RBF SVM in R
  - Tuning the hyperparameters of an RBF SVM in Python
- Benchmarking study
- SVMs for regression
- One-class SVMs
  - One-class SVM in R
  - One-class SVM in Python
- Extensions to SVMs
- Opening the SVM black box
- Quiz
Ensemble Methods
- Ensemble methods
- Bootstrapping
- Bagging
  - Baggin in R
  - Bagging in Python
- Boosting
  - Adaboost in R
  - Adaboost in Python
- Random Forests
  - Random Forests in R
  - Random Forests in Python
- XGBoost
  - XGBoost in R
  - XGBoost in Python
- Quiz
Bayesian Networks
- Bayesian Networks
- Example Bayesian Network Classifier
- Naive Bayes Classifier
  - Naive Bayes classifier in R
  - Naive Bayes classifier in Python
- Tree Augmented Naive Bayes Classifiers
- Bayesian networks examples
- Quiz
Variable Selection
- Variable selection
- Filter methods (gain, Cramer’s V, Fisher score)
  - Cramer’s V in R
  - Cramer’s V in Python
  - Information Value in R
  - Information Value in Python
- Forward/Backward/Stepwise regression
  - Forward/Backward/Stepwise in R
- BART: Backward Regression Trimming
  - BART variable selection in R
- Criteria for variable selection
- Quiz
Model interpretation
- Model interpretation
- Feature Importance
- Permutation based feature importance
- Partial dependence plots
  - Partial dependence plots in Python
- Individual conditional expectation (ICE) plots
  - ICE plots in Python
- Visual analytics
- Decision tables
- LIME
  - LIME in Python
- Shapley value
  - Shapley value in Python
Model deployment
- Model deployment
- Model governance
- Model ethics
- Model documentation
- Model backtesting
- Model benchmarking
- Model stress testing
- Privacy and Security
- Quiz
Machine Learning Pitfalls
- Sample bias
- Model risk
- Deep everything
- Leader versus follower
- Complexity versus trust
- Statistical myopia
- Profit Driven Machine Learning
- Quiz
Quiz

Course Staff

Prof. dr. Bart Baesens

Bart was born in Bruges (West Flanders, Belgium) on February 27th, 1975. He speaks West-Flemish (which he is very proud of!), Dutch, French, a bit of German, some English and can order a beer in Chinese. He is married to Katrien Denys and has 3 kids (Ann-Sophie, Victor and Hannelore), and 2 cats (Felix and Simba). Besides enjoying time with his family, he is also a diehard Club Brugge soccer fan. Bart is a foodie and amateur cook. He loves drinking a good glass of wine (his favorites are white Viognier or red Cabernet Sauvignon) either in his wine cellar or when overlooking the authentic red English phone booth in his garden. His favourite pub is “In den Rozenkrans” in Kessel-Lo (close to Leuven) where you will often find him having a Gueuze Girardin 1882 or Tripel Karmeliet with a spaghetti of the house. Bart loves traveling and his favorite cities are: San Francisco, Sydney and Barcelona. He is fascinated by World War I and reads many books on the topic. He is not a big fan of being called professor Baesens (or even worse, professor Baessens), shopping (especially for clothes or shoes), pastis (or other anise-flavored drinks), vacuum cleaning (he can’t bare the sound), students chewing gum during their oral exam of Credit Risk Modeling (or had garlic for breakfast), long meetings (> 30 minutes), phone calls (asynchronous e-mail communication is a lot more efficient!), admin (e.g., forms and surveys) or French fries (Belgian fries are a lot better!). He is often praised for his sense of humor, although he is usually more modest about this. Bart is also a professor of Big Data and Analytics at KU Leuven (Belgium) and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on Big Data & Analytics, Credit Risk Modeling, Fraud Detection and Marketing Analytics. He has written more than 250 scientific papers, some of which have been published in well-known international journals (e.g., MIS Quarterly, Machine Learning, Management Science, MIT Sloan Management Review and IEEE Transactions on Knowledge and Data Engineering) and presented at top international conferences (e.g., ICIS, KDD, CAISE). He has received various best paper and best speaker awards. Bart is the author of 8 books: Credit Risk Management: Basic Concepts (Oxford University Press, 2009), Analytics in a Big Data World (Wiley, 2014), Beginning Java Programming (Wiley, 2015), Fraud Analytics using Descriptive, Predictive and Social Network Techniques (Wiley, 2015), Credit Risk Analytics (Wiley, 2016), Profit Driven Business Analytics (Wiley, 2017), Web Scraping for Data Science with Python (Apress, 2018), and Principles of Database Management (Cambridge University Press, 2018). He sold more than 25.000 copies of these books worldwide, some of which have been translated in Chinese, Russian and Korean. His research is summarized at www.dataminingapps.com. For an overview of the courses he is teaching, see www.bartbaesens.com. He also regularly tutors, advises and provides consulting support to international firms regarding their big data, analytics and credit risk management strategy.

Prof. dr. Tim Verdonck

Tim was born in Merksem (Antwerp, Belgium) on February 19, 1983. He lives together with his girlfriend Nuria Baeten, his daughter Oona, his dog Ragna and two cats Nello and Patrasche (the names of the cats come from the novel A Dog of Flanders, which takes place in Hoboken and Antwerp, see www.visitantwerpen.be). He lives in Wilrijk (Antwerp, Belgium) and enjoys relaxing in his garden with his family. He loves travelling and his favorite cities are Barcelona and Vancouver. On holidays, he likes to dive (his favorite place is Sipadan in Malaysia), snowboard and wakeboard. His other favorite sports are tennis and football.

Tim Verdonck is also a professor of Statistics and Data Science at the Department of Mathematics of University of Antwerp (Belgium). He is affiliated to KU Leuven and has been an invited professor at the University of Bologna, teaching advanced non-life insurance in the Master of Quantitative Finance. He is chairholder of the BNP Paribas Fortis Chair on Fraud Analytics, the Allianz Chair on Prescriptive Business Analytics in Insurance and the BASF Chair on Robust Predictive Analytics. Tim has a degree in Mathematics and a PhD in Science: Mathematics, obtained at the University of Antwerp. During his PhD he successfully took the Master in Insurance and the Master in Financial and Actuarial Engineering, both at KU Leuven. His research interests are in the development and application of robust statistical methods for financial, actuarial and economic data sets. He is associate editor of Statistics: A Journal of Theoretical and Applied Statistics (Taylor & Francis) and Computational Statistics & Data Analysis (Elsevier). Tim is co-organizer of the Data Science Meetups in Leuven and managing partner at Boltzmann (www.boltzmann.be), a team of experts in machine learning that transform data into actionable insights.

bluecourses: BC3
Machine Learning Essentials

Machine Learning Essentials
Enroll