MNIST Dataset Classification

Import Data

Quick Look

Split Data

The MNIST dataset has already been prepared into training and test sets (first 60,000 and last 10,000).

Shuffle Training Set

This will help prevent cross validation folds and will reduce risk of multiple similar instances in a row.

Training a Binary Classifier

This will train a classifier for 9s and non-9s.

Performance Measures

Accuracy will be high in classification problems. It's usually not preferred in classification.

cross_val_predict performs cross validation then returns the predictions instead of the accuracy. This gives you a clean prediction for each instance.

precision_score( ) returns TP / (TP + FP)

When it claims an image represents a 9, it's correct 67% of the time

recall_score( ) returns TP / (TP + FN)

It only detects 64% of the 9s

The F1 score is the harmonic mean of precision and recall

Precision and Recall are trade-offs

increase threshold for precision, decrease for recall

Scikit-Learn does not have support for setting the threshold directly.

Choose Threshold

The ROC Curve

ROC curve plots the true positive rate (recall) against the false positive rate. The dotted line is a completely random classifier You want your classifier to be as far to the top left corner as possible

You want the area under the curve to as close to one as possible

Precision/Recall or ROC?

Use P/R when the positive class is rare or when you care more about the false positives than the false negatives and the ROC curve otherwise.

Multiclass Classification (multinomial classifiers)

One vs All: Binary classifier for each class
One vs One: Binary classifier for each class vs each class N * (N-1) / 2

Scikit Learn defaults to OvA unless using a SVM

Manually use OvO instead of OvA

Random Forest classifiers don't need OvO or OvA as they can classify multiple classes

Normally we would have this all done before any training but we're gonna scale the inputs now:

Error Analysis

Normally we would have optimized the system a little already but we are going to assume we already did.

compare error rates vs absolute number of errors

Multilabel Classification

Multioutput Classification

adding noise then reducing noise in the images