Training a Decision Tree or a Random Forest on a classification problem, and compare the latter with using adaBoost

Author: Pr Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL Université Paris

1. Decision Trees with SciKit-Learn on a very simple dataset

We will first work on very simple classic dataset: Iris, which is a classification problem corresponding to determination of iris flower sub-species based on a few geometric characteristics of the flower.

Please FIRST READ the Iris DATASET DESCRIPTION. In this classification problem, there are 3 classes, with a total of 150 examples (each one with 4 input). Please now execute code cell below to load and view the dataset.

Building, training and evaluating a simple Decision Tree classifier

The SciKit-learn class for Decision Tree classifiers is sklearn.tree.DecisionTreeClassifier.

Please FIRST READ (and understand!) the DecisionTreeClassifier DOCUMENTATION to understand all parameters of the contructor.

You can then begin by running the code block below, in which default set of parameter values has been used. If graphical view works, look at the structure of the learnt decision tree.

Then, check the influence of MAIN parameters for Decision Tree classifier, i.e.:

NB : Note that post-training PRUNING IS unfortunately NOT implemented in SciKit-Learn Decision-Trees :(

2. Decision Trees on a MORE REALISTIC DATASET: HANDWRITTEN DIGITS

Please FIRST READ the Digits DATASET DESCRIPTION.

In this classification problem, there are 10 classes, with a total of 1797 examples (each one being a 64D vector corresponding to an 8x8 pixmap). Please now execute code cell below to load the dataset, visualize a typical example, and train a Desicion Tree on it. The original code uses a voluntarily SUBOPTIMAL set of learning hyperparameters values, which reaches ~66% test acuracy. Try to play with them in order to improve acuracy.

Question: According to the confusion matrices, what digits are the most confused with each other?

Answer:

Finally, find somewhat optimized values for the set of 3 main hyper-parameters for DecisionTree learning, by using GRID-SEARCH WITH CROSS-VALIDATION (see cross-validation example from the Multi-Layer Perceptron notebook used in earlier practical session). Put the code in the cell below:

Question: What best value have you managed to reach for TEST accuracy of your DecisionTree after you properly gridSearched its hyper-parameters using CrossValidation?

Answer:

In order to improve result, the most natural step is to combine SEVERAL decision trees, using the Ensemble model called Random Forest: see below

3. Building, training and evaluating a Random Forest classifier

The SciKit-learn class for Random Forest classifiers is sklearn.ensemble.RandomForestClassifier.

Please FIRST READ (and understand!) the RandomForestClassifier DOCUMENTATION to understand all parameters of the contructor.

Then you can begin by running the code block below, in which default set of parameter values has been used. As you will see, a RandomForest (even rather small) can easily outperform single Decision Tree.

Then, check the influence of MAIN parameters for Random Forest classifier, i.e.:

Finally, find somewhat optimized values the set of 3 main hyper-parameters for RandomForest, by using CROSS-VALIDATION (see cross-validation example from the Multi-Layer Perceptron notebook used in earlier practical session). Put the code in the cell below:

Question: What best value have you managed to reach for TEST accuracy of your RandomForest after you properly gridSearched its hyper-parameters using CrossValidation?

Answer:

3. Building, training and evaluating an AdaBoost classifier

The SciKit-learn class for adaBoost is sklearn.ensemble.AdaBoostClassifier.

Please FIRST READ (and understand!) the AdaBoostClassifier DOCUMENTATION to understand all parameters of the contructor.

Then begin by running the code block below, in which a default set of parameter values has been used.

Then, check the influence of MAIN parameters for adaBoost classifier, i.e.:

Finally, check which other types of classifiers can be used as Weak Classifier with the adaBoost implementation of SciKit-Learn. NB: in principle it is possible to use MLP classifiers as weak classifiers, but not with SciKit-learn implementation of MLPClassifier (because weighting of examples is not handled by its implementation).

Question: Looking at the training curves, you can see that training error goes down to zero rather quickly, but that test_error still continues, after training error is zero, to diminish with increasing iterations. Is it normal, and why? (check the course!)

Answer:

Now, for the case of DecisionTree weak classifiers, find somewhat optimized values of (max_depth, n_estimators) by using CROSS-VALIDATION. Put the code below:

Question: What best value have you managed to reach for TEST accuracy of your AdaboostClassifier after you properly gridSearched its hyper-parameters using CrossValidation?

Answer: