Ensemble Methods

Ensemble Methods

Plan

  1. Decision Trees

  2. Bagging (e.g. RandomForest)

  3. Boosting (e.g. GradientBoosting)

  4. Stacking

1. Decision Trees

Decision Trees are hierarchical supervised learning algorithms

They primarily help us with:

- Classification and Regression

- Non-linear & Non-parametric Modelling

- Features Selection

How Do We “Grow” a Tree?

  • Trees work with a greedy divide-and-conquer strategy to identify ways to split a data set based on different conditions (feature, threshold).
  • Learn simple decision rules (if-else) inferred from the data features.
  • The algorithm selects the “best” condition based on a specific score.
  • The algorithm recursively split the subset of data.
  • When no satisfying conditions are found, the node becomes a leaf.

DecisionTreeClassifier

Select the split that results in the most homogeneous sub-node

Gini Index

The Gini index measures the ability of each feature to separate the data. It calculates the impurity of each node, between [0, 1] - the lower the better.

\[ Gini(node)=1−∑p_i^2 \]

\(p_i\) being the ratio between the observations in a class \(i\) and the total number of observations remaining at a given node.

Predicting

A new point is passed through the tree from top to bottom until it reaches a leaf. The most represented class in that leaf is the predicted class for a given data point

Visual Intro

DecisionTreeRegressor

Ensemble Methods

Combine several decision trees to produce better predictive performance than utilizing a single decision tree.

2. Bagging (i.e Bootstrap Aggregating)

Bootstrap aggregating, also known as Bagging, is the aggregation of multiple versions of a model

  • It is a parallel ensemble method

  • The aim of bagging is to reduce variance

  • Each version of the model is called a weak learner

  • Weak learners are trained on boostrapped samples of the dataset

Bootstrapping (or Generating Bootstrapped Samples)

  • The samples are created by randomly drawing data points, with replacement

  • Features can also be randomly filtered to increase bagging diversity

Random Forests = Bagged Trees

Random Forests are a bagged ensemble of Decision Trees

Predictions are averaged in regression tasks, and voted in classification tasks

Pros and Cons of Bagging

👍 Advantages:

  • Reduces variance (overfitting)

  • Can be applied to any model

👎 Disadvantages

  • Complex structure

  • Long training time

  • Disregards the performance of individual sub-models

3. Boosting

Boosting is a method designed to train weak learners that learn from their predecessor’s mistakes

  • It is a sequential ensemble method

  • The aim of boosting is to reduce bias

  • Focuses on the observations that are harder to predict

  • The best weak learners are given more weight in the final vote

3.1 Gradient Boosting 🔥

Only implemented for trees

Instead of updating the weights of observations that were misclassified, Gradient Boosting will

  1. Recursively fit each weak learner \(d_{tree i}\) so as to predict the residuals of the previous one
  2. Add all the predictions of all weak learners

\[D(\mathbf{x}) = d_\text{tree 1}(\mathbf{x}) + d_\text{tree 2}(\mathbf{x}) + ...+ d_\text{tree n}(\mathbf{x})\]

Visual Explanation

XGBOOST (Extreme Gradient Tree Boosting)

  • Dedicated library, optimized for this task

  • Nice features inspired by Deep Learning

First, let’s split the dataset into train, test and validation sets