Remember that in my last post about neural networks, I tried (and failed) to replicate the results I obtained in R using Python.
I have been thinking about how I would solve the problem, and frankly I wasn’t eager to spent too much time on such a silly example, especially since I’m not a specialist of the PyBrain module.
The problem is, beside my occasional laziness when it comes to solving problems, I’m also quite stubborn and I don’t like to let things go that easily.
And I realized that it was a great opportunity to write about failure, and how to react when confronted to it.
Recognizing the problem :
Speaking about failure implies an inability to reach a predefined goal. In my case, my goal was to provide the reader with a simple and working example of a neural network’s implementation.
But unfortunately the result was not the one expected : the predictions made by the network were useless.
So what could I do ? Was I willing to spend a considerable amount of time mastering PyBrain in order to understand precisely what went wrong in order to fix it ? I was not. It sure is the best solution, however practical constraints often lead you away from such optimal responses. Whether it is for your job or for your personal projects, there is always that Best way of getting things done. But then there are also 1000 reasons keeping your from that path : a deadline soon to be met, too many projects to work on at the same time to allocate a lot of energy on a single one, etc. So sometimes we need to adapt in order to find, not the absolute best way of doing things, but the one that will get things done knowing that we live in a constrained environment.
In my case, I wanted to get my network to work quickly so that I could concentrate on other projects. So back to the goal we defined earlier : building a simple neural network example with reasonable predictive power.
Taking a step back :
Now, since my last approach didn’t work and instead of desperately trying to get the initial setting I had imagined to work, let’s think again about what the network does ought to do and what we’ve been doing at that point :
- the ultimate goal is to classify correctly the shapes drawn in the images ;
- the network relies on patterns discerned among the data points to make its predictions ;
- we have 1024 input data points corresponding to the pixels’ values (0 if white, 1 if black) ;
- the network thus created fails to make robust predictions.
Based on the following elements, what can be done to resolve this situation ? It’s always a good first step to sum up any useful information we have at hand. And in my case, it’s not much : I know that a neural network is good at finding patterns. I know that the less noise there will be in the data, the easier it’s going to be to find relevant relations between inputs and outputs. I also know that I’m basically treating my network as a black box since when I look at the 1024 input data points, I can’t make any inference on the type of relation going on there. And I don’t like the idea of letting my tool do all the work when I don’t quite understand what I should be looking for.
Simplifying to understand
If we don’t understand precisely what’s going on, maybe should we simplify our problem. That way, we may be able to provide data points with a stronger and more directly visible relation to the output. The task will then be easier for us to understand and implement, and the network should perform better if we feed it less data points, but more valuable information.
So what information other than the arrangement of the pixels can we provide ? Look at the matrix representing the image. The lines composing the shapes are 3 pixels wide. A rectangle will thus have 3*2 long lines corresponding to its longest sides, a characteristic it doesn’t share with the other shapes. A triangle should in theory have 3 long lines as base, and a circle none. That’s already one quite strong discriminating value. Let’s find some more. We use the value of the longest row in the image to count them, so we might as well store it as a data point. Our triangles are always orientated the same way. So the upper angle means a row with very few pixels active (1 or 2). It’s a configuration specific to the triangles. And that’s one more data point. It seems pretty robust, but let’s add a last one just to be sure. Look at the shapes : what seems to be characteristic of each one that we haven’t mentioned yet? The angles of course ! A rectangle typically has 8 (you’ll understand quickly why I don’t say 4), which is less than what a triangle would have, which is in turn less than what a circle would have.
Summing up our new data points and the corresponding assumptions:
- number of repetitions of the longest row ; rectangle should have 6, triangle 3, circle ?
- number of pixels in the longest line ; rectangles should have the highest value
- number of pixels in the shortest line ; triangles should have the lowest value
- number of angles in the image. value for rectangles < triangles < circles
Spoiler : it works and outperforms by far the network built upon 1024 input points.
Load the modules we need :
import numpy as np from skimage.io import imread from pybrain.tools.shortcuts import buildNetwork from pybrain.supervised.trainers import BackpropTrainer from pybrain.datasets import ClassificationDataSet from pybrain.structure.modules import SoftmaxLayer from pybrain.structure import SigmoidLayer
I may have re-factored a bit too much this time, my bad. Following are the functions we are going to use to load the images and extract the 4 data points from each of them.
def compute_rows(image_array): pixels_per_row = image_array.sum(axis=1) max_pixels_per_row = pixels_per_row.max() min_pixels_per_row = pixels_per_row[pixels_per_row!=0].min() nbr_max_rows = len(pixels_per_row[pixels_per_row==max_pixels_per_row]) return min_pixels_per_row, max_pixels_per_row, nbr_max_rows def compute_angles(image_array): nrows = image_array.shape-1 ncols = image_array.shape-1 nangles = 0 for i in range(nrows): for j in range(ncols): if (image_array[i:i+2, j:j+2].sum()%2)!=0: nangles+=1 return nangles def image_to_inputs(image): image = np.where(image==0, 1, 0) inputs =  rows_info = compute_rows(image) for value in rows_info: inputs.append(value) angles_nbr = compute_angles(image) inputs.append(angles_nbr) return inputs def import_dataset(path, shapes, used_for, samples_nbr): ds = ClassificationDataSet(4, nb_classes=3) for shape in sorted(shapes): for i in range(samples_nbr): image = imread(path+used_for+'/'+shape+str(i+1)+'.png', as_grey=True, plugin=None, flatten=None) image_inputs = image_to_inputs(image) ds.appendLinked(image_inputs, shapes[shape]) return ds
A word about the algorithm I wrote for compute_angles() : we study 2*2 cells at a time, shifting by 1 cell on the right at each iteration. When the end of the row is reached, we go back at (row+1, col=0) and start again until all rows are done. An angle (corner would be more appropriate) is spotted when the number of white (or black) cells in the 4 considered is odd. Here, have a gif :
Now to load the datasets :
ds_training = import_dataset('C:/Users/alexis.matelin/Documents/Neural Networks/Visual classification/shapes/', shapes, "training", 15) ds_testing = import_dataset('C:/Users/alexis.matelin/Documents/Neural Networks/Visual classification/shapes/', shapes, "testing", 8) ds_training._convertToOneOfMany(bounds=[0, 1]) ds_testing._convertToOneOfMany(bounds=[0, 1])
And give it a try ! Compared to last time, I changed the hidden layer activation function from Tanh to Sigmoid because our inputs are no longer binary. I played around with the other parameters in order to find the best set-up. The trainer is limited to 2000 epochs because it’s enough to train a robust model. Otherwise it would train forever until the convergence is reached, without any improvement on the predictive accuracy.
I trained 20 networks and got 100% correct answers on the testing set for 18 of these trials (pretty good don’t you think).
net = buildNetwork(4,10, 3, hiddenclass=SigmoidLayer, outclass=SoftmaxLayer) trainer = BackpropTrainer(net, learningrate=0.01, momentum=0.01, weightdecay=0.01, dataset=ds_training) trainer.trainUntilConvergence(maxEpochs=2000) out = net.activateOnDataset(ds_testing) out = out.argmax(axis=1) print(out)
There you go ! So the conclusion is : don’t give up, take your time, try to think differently about the issue you’re confronted to and you’re probably going to end up with a better solution than the one you imagined initially. If you can’t see a way out and the complexity is just too much for what you can handle right now, simplify as much as you can while keeping track of your objective. Very often once broken down into simple but essential pieces a problem is much easier to solve.
In short : follow the KISS principle.