Early forms of neural networks were developed in the 1950s but have seen a resurgence of popularity since about 2011, which was a key year with the release of AlexNet and Apple’s iOS integrating Siri with voice recognition. The popularity is fueled by massive increases in data and compute power as well as new activation functions and optimization methods that solve the “vanishing gradient” problem to enable the construction of large networks with many layers that outperform classical machine learning models on a variety of tasks.
Neurons are organized into layers. Each neuron implements a linear model whose output is processed through an “activation” function:
$y = f(B_{0} + B_{1}x_{1} + B_{2}x_{2} + … B_{n}x_{n})$
The hidden layers of a network often use a rectified linear unit activation (ReLU) function:
$f(x) = max(0, x)$
while the output layer often uses a sigmoid function and essentially consists of logistic regression models. In this lab, you’re going to apply your knowledge about decision boundaries of linear classifiers to explore how neural networks perform classification. We’ll use a multilayer perceptron model, which is a type of dense, feedforward neural network.
For this lab, we are omitting a train/test split—not because you shouldn’t do train/test splits—but because it does not contribute to the objectives that this lab is meant to illustrate.
Import the necessary libraries and modules and then set NumPy random number generator seed value to 42: np.random.seed(42)
This makes successive runs of your code repeatable. SGDClassifier (stochastic gradient descent) and many other functions use pseudorandom numbers to sequence the data, etc. In your final version, you could remove this and re-run your code several times to see how consistent your results are (that is, are your results overly sensitive to random, lucky chances in the model fitting, etc., or are your results robust to these effects). In practice, it is challenging to make modern ML algorithms fully repeatable since there are many sources of variation: random number generator in each library you use, library code that uses non-deterministic sources of randomness (network packet timing, keyboard input timing, current time in nanoseconds, etc.), and libraries that automatically tune their implementation to achieve maximum performance on your GPU hardware, to name a few.
mlp = MLPClassifier(hidden_layer_sizes=(4,),max_iter=1000, solver="lbfgs")
Extract the weight vectors for the hidden layers:
mlp_models = np.vstack([mlp.intercepts_[0], mlp.coefs_[0]]).T
The columns of this matrix correspond to B_{0}, B_{1}, and B_{2}. Each row corresponds to the model from a separate neuron.
Re-arrange the following equation to solve for $x_{2}$:
$0 = B_{0} + B_{1} x_{1} + B_{2} x_{2}$
Make one figure (subplots may help) that visualizes the features along with the decision boundaries for the 4 neurons in the hidden layer. Use appropriate axis limits for this visualization.
Create a meshgrid in the range of [-2, 2] along each dimension.
Plot the grid of synthetic points as a scatter plot.
Use the Input and Neuron classes in the provided neurons.py file to calculate the value for the first hidden layer neuron at each grid point. Pass the grid points into the predict() method as an $N^2 \times 2$ matrix, where N is the edge length of your meshgrid matrices. You’ll need to use some NumPy Array Manipulation Routines.
input = Input() p_layer = Neuron([input], mlp_models[0, :]) pred = p_layer.predict(X)
Plot the model outputs as a heatmap or contourf plot.
Repeat for the remaining 3 neurons in the hidden layer.
Use the Input, Neuron, and HStack classes with the weights from the MLP model to recreate the hidden layer.
input = Input() layer_1 = Neuron([input], mlp_models[0, :]) layer_2 = Neuron([input], mlp_models[1, :]) layer_3 = Neuron([input], mlp_models[2, :]) layer_4 = Neuron([input], mlp_models[3, :]) stacked = HStack([layer_1, layer_2, layer_3, layer_4])
Predict the transformed values to create a transformed feature matrix
transformed_X = stacked.predict(X)
Train a Logistic Regression model using the new features that were created by the MLP.
Calculate the true positive rate (TPR) and false positive rate (FPR) on the predictions and plot the results in an ROC. The ROC plot should include the results from Experiment 1 in addition to the newly trained model.
Put answers to the following reflection questions at the top of your notebook (after your title and name).
In Canvas, submit:
Please see the rubric on Canvas for grading critera.
I will be looking for the following: