- Review the materials from sections 18.7.2 and 18.7.4 of the Russell and Norvig, in particular the algorithm presented in figure 18.24. For additional resources, consider the Wikipedia articles on Perceptrons, Feedforward neural networks and Backpropagation
- Download and install the Perceptron.java It is a
perceptron network with a single output node. It uses a step activation function.
Currently the training data is set up to learn the Boolean and function.
Implement the
`trainNetwork()`method with a fixed number of ten training episodes. - Given the
*learning rate*of 0.1, how many training episodes does the network take to learn Boolean**and**? - Modify the
*learning rate*. Can you learn in one training episode? If so, what is the learning rate? - Set the
*learning rate*back to 0.1. Now modify the*threshold*value. Can you learn in one training episode? If so, what is the*threshold*value? - What is the relationship between the
*learning rate*and the*threshold*value in the context of questions (4) and (5)? - Modify the training data so as to attempt to learn the
**xor**Boolean function. Experiment with different values for the*learning rate*and the*threshold*value. What appears to be the problem with the perceptron network when attempting to learn**xor**?

- Experiment with the following tuning parameters:
*number of training episodes*,*learning rate*and*hidden layer size*. Currently, they are set to 1,000, 0.1 and 3, respectively. This is not sufficient to learn**xor**reliably. Change those parameters to values so as to efficiently and reliably train the network. Add to your report several sets of values with which you were able to train the network efficiently and reliably. Briefly state what you consider to be reliable. Additionally, address any relationships you may notice among the parameters. Notice that the`testNetwork()`method prints the actual output and the desired output. - Now modify the
`XOR.java`file so that the network uses a step activation function rather than a sigmoid activation function. Hint: Recall that the derivative of the step function is not defined for all values. Experiment with values of the number of training episodes, learning rate, threshold of the activation function and hidden layer size. Add to your report several sets of values with which you were able to train the network efficiently and reliably. Briefly state what you consider to be reliable. Additionally, address any relationships you may notice among the parameters.

There are two portions to this assignment.

- At first, you need to curate the data from the MNIST data set so
that it can be used for training purposes. We recommend that you:
- Learn about the byte level layout of an idx-file.
- Study which Java classes and functions you will be using to read bytes from an idx-file.
- Format and arrange the bytes so that they can be used to train the given neural network.

- For the second part of this assignment, we ask that you experiment with the various parameters so as to efficiently and reliably train the network. In your report, document the number of training episodes required to obtain the sort of precision you deem sufficient. Please justify your decision for the precision. Additionally, include the values of the following parameters: the range of the initial weights, the learning rate and the number and size of the hidden layers. Address how long it takes to train your network, given your parameters.

- Your report for the experiments your conducted.
- Your modified
`XOR.java`and`Training.java`files.