The Unreasonable Effectiveness of Training With Jitter (i.e, How to Reduce Overfitting)

Auro Tripathy
3 min readDec 27, 2021

In many scenarios where we’re learning from a small dataset, an over-fitted model is a likely outcome. By that we mean, the model may perform OK on the training data but does not generalize very well to test data.

In this post, we highlight a simple yet powerful way to reduce over fitting. For the impatient, please dive straight into code.

The Dataset

Our dataset has just 31 two-dimensional points distributed equally across two classes. I came across this dataset in Russell Reed’s seminal book, Neural Smithing (page 282). To my knowledge the data isn’t available on the internet, so I had to recreate it by hand (it was fun to work with a ruler and pencil). See my handiwork below. The two classes are represented by the ‘+’ and ‘o’ symbols.

Converting from the analog domain (paper) to digital (a file) gives us the 31 points spread across two classes.

The Model

The model is a very simple 2/50/10/1 Multi-Layer Perceptron (MLP) network, the same used in Russell Reed’s book. Note, I’ve (inadvertently) switched the hidden layer; to 2/10/50/1 instead of 2/50/10/1, which is probably why the decision boundary does the look similar to the one in the book.

The model is captured below.

Trained to Intentionally Over-fit

Per Russell Reed, “With 671 weights, but only 31 training points, the network is very under-constrained and chooses a very nonlinear boundary”. And it does turn out that way, as you can see below.

Smoothing with Jitter

Per the book, “training with jittered data discourages sharp changes in the response near the training points and so discourages the network from overly complex boundaries.” Following the guidance from the book, we do not change any of the training hyperparameters, except, during training, we jitter the data as we feed them into the net.

The function to jitter the input is specified below.

We notice that, for the same number of epochs and the same batch-size (effectively, the same hyper-parameters), the training regime is unable to overfit on the meager dataset (however hard we try).

Summary

To summarize, we went from the intentionally overfitted situation on the left-hand-side to a more generalized situation on the right-hand-side by adding small amounts of jitter to the input data.

Code, Repeating my Results, and Further Experiments

The code repo has two scripts, classify.py and utils.py. The default mode it to simply train a classifier over the small dataset with intentional overfitting (no jitter in the input data).

python3 classify.py

The training yields the decision boundary plot, Known-Overfit.png.

To jitter the inputs, type:

python3 classify.py  --jitter

Training with jitter yields the decision boundary plot, Noise-Added-to-Smooth-boundary.png.

Reference

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks (A Bradford Book) Illustrated Edition, by Russell Reed

--

--