· all posts

Be careful about overfitting when learning new skills

Overfitting is a term used in statistics to describe when a model follows a set of data too closely. When a model follows the data too closely, predictions made from the model are not reliable. The model is not able to capture the general trend of the data. Rather, it describes the data exactly as it is.

Predictions made from overfitted models will, on average, be further away from the true result, once it is observed, compared to a more general model whose average error is lower.

Although overfitting is a technical term used in statistics, the same idea is, interestingly, a way to describe the act of learning something in a way that isn’t useful in the future. That is, learning something as a step-by-step procedure to be memorized and repeated rather than seeing the general idea, knowing the specific ways it has been applied, and then making decisions in novel situations that resemble the prior experience, but not exactly.


One example might be playing piano. It’s possible to memorize exactly what keys to press, in exactly what order, to play a song. While this is certainly one way to do it, the end result is that you’re able to only play one song. Without knowing a little bit about how notes harmonize together, without providing some room for creative error, new songs cannot be created. One approach to playing the piano might be rote memorization, not even knowing how to read sheet music or what a note is, but that skill is not easy to generalize to future piano playing. It’s not possible to build upon that kind of knowledge when learning the second song. The second song is just another thing to memorize.

Now I doubt that many people actually play the piano that way, completely memorizing the keys and the order in which to play them without knowing anything about music, but hopefully it’s a tangible example.

On the other hand, another approach to playing songs on the piano might be understand the fundamentals of music, the notes, how to read sheet music, certain patterns of playing notes together, and so on. When learning the second song, it’s then possible to reuse the experience of learning the first song, to build upon it. With the knowledge of the terminology and patterns, you can relate the second song to the first, see how it’s different and how it’s similar. You might even be able to add your own twist because, although the new tune may be unfamiliar, you have a general sense of how good songs sound and how to reproduce those sounds from scratch, tweaking them along the way to change how it’s heard.


A similar idea applies to other creative pursuits as well, where you have the opportunity to do something a little differently each time. In business, having a general idea of industry trends may be more useful than knowing exactly what happened on what date in the past year. Of course, one might start from the specific events, but to make accurate predictions, one needs to see the general patterns too. It may be true that many people were making online banking and file sharing websites in the late 1990s. If an investor were to predict that online banking will continue to grow, that’s mostly correct, but it misses the bigger picture: that many different services can be facilitated through the web. In this example, the notion of drawing a line on a chart between data points fits well. The more accurate the investor’s predictions, the better off the investments will be because the rate of error will be lower than a model that follows each data point wherever it may go, because surely a data point will show up unexpectedly far away from a line which likes to hug the data points.


I don’t know if there’s a word to describe this idea, essentially taking the term overfitting from statistics and using it in a much broader sense to describe how people learn and make decisions, but I feel it’s an important idea to keep in mind.

A lot of information is easy accessible these days, so the skill of generalizing based on past trends is in greater demand than being able to perform a detailed though repetitive task.

At the same time, there’s the idea of underfitting in statistics, where the model is lacking and is missing specific pieces of data.

The key is to not overfit when learning, but also to not underfit. Know the specifics well, but know that novel situations may deviate as well.