Decision Trees
Bagging (e.g. RandomForest
)
Boosting (e.g. GradientBoosting
)
Stacking
Decision Trees are hierarchical supervised learning algorithms
They primarily help us with:
- Classification and Regression
- Non-linear & Non-parametric Modelling
- Features Selection
Select the split that results in the most homogeneous sub-node
The Gini index measures the ability of each feature to separate the data. It calculates the impurity of each node, between [0, 1] - the lower the better.
\[ Gini(node)=1−∑p_i^2 \]
\(p_i\) being the ratio between the observations in a class \(i\) and the total number of observations remaining at a given node.
A new point is passed through the tree from top to bottom until it reaches a leaf. The most represented class in that leaf is the predicted class for a given data point
Combine several decision trees to produce better predictive performance than utilizing a single decision tree.
Bootstrap aggregating, also known as Bagging, is the aggregation of multiple versions of a model
It is a parallel ensemble method
The aim of bagging is to reduce variance
Each version of the model is called a weak learner
Weak learners are trained on boostrapped samples of the dataset
The samples are created by randomly drawing data points, with replacement
Features can also be randomly filtered to increase bagging diversity
Random Forests are a bagged ensemble of Decision Trees
Predictions are averaged in regression tasks, and voted in classification tasks
👍 Advantages:
Reduces variance (overfitting)
Can be applied to any model
👎 Disadvantages
Complex structure
Long training time
Disregards the performance of individual sub-models
Boosting is a method designed to train weak learners that learn from their predecessor’s mistakes
It is a sequential ensemble method
The aim of boosting is to reduce bias
Focuses on the observations that are harder to predict
The best weak learners are given more weight in the final vote
Only implemented for trees
Instead of updating the weights of observations that were misclassified, Gradient Boosting will
\[D(\mathbf{x}) = d_\text{tree 1}(\mathbf{x}) + d_\text{tree 2}(\mathbf{x}) + ...+ d_\text{tree n}(\mathbf{x})\]
Dedicated library, optimized for this task
Nice features inspired by Deep Learning
First, let’s split the dataset into train, test and validation sets