Model Selection And Validation
For this assignment, you are to determine which model is best for prediction, report the
right hyperparameters, and the resulting accuracy for the Digit Recognition data set that
was used in the previous assignment.
As before, create a PDF of your notebook showing your steps and include the table
below as mentioned. Attach both a jupyter notebook and the PDF version. Also, be sure
to include your name at the start of the notebook and at the top of the PDF.
Specifically, you are to test the following models
Model Hyperparameter Testing range Notes
Support Vector Machine
Gamma – size of the kernel C – slack variable
10-x for x = -5 to 5 use the ‘rbf’: radial basis function kernel
K-nearest neighbors
k – number of neighbors
1,3,5,7,9 use the sklearn function
Decision Trees min_samples_split 3,5,7,9 (1 was removed)
use the defaults for the other hyperparameters.
Logistic Regression C – inverse of the regularization strength (smaller = more regularization)
10-x for x = -5 to 5 with the L1 penalty (Lasso)
Steps are as follows:
1. Separate your data into training and testing. We will use cross-validation over the
training set to select the right parameters
a. Use train_test_split to create a separate training and test set.
X_train, X_test, y_train, y_test = train_test_split(X,
y, stratify=True, test_size=0.20)
b. For the training set, you have two choices to perform hyperparameter
selection.
i. Use cross-validation to evaluate each model variant and select the
best hyperparameters (standard practice, most recommended)
ii. Create a hold-out validation set and train on one portion of the data
and use the accuracy on the hold-out validation set to pick the right
hyperparameters (also valid)
2. Steps to turn in for the assignment
a. Train the four models with their default parameters. Report the resulting
accuracy of each model using the default parameters.
b. For each of the four models, find the hyperparameters giving the highest
accuracy on the validation set by performing an exhaustive grid search.
Report the hyperparameter values and accuracy on the validation
set.
i. Consider using sklearn.model_selection.GridSearchCV
ii. For the models with two hyperparameters, you will need to search
both simultaneously to find the optimum combination
c. Now apply the highest accuracy trained models to the test set. Report the
accuracy of each model.
Fill the following table with the information.
Model Default validation accuracy
Tuned validation accuracy
Selected hyperparameter s
Final test set accuracy
SVM
k-NN
Decision Trees
Logistic Regression