That’s right, the most commonly used architecture for AI right now was invented in the 1940s, and didn’t see widespread use until the last couple of decades.
But why? Did we just overlook an obviously powerful method of machine learning?
Not quite, the reasons that we didn’t utilize neural networks in most AI research are a bit more complex than that.
One of the biggest reasons is that neural networks require a massive amount of data to train effectively, for any model, a general rule of thumb is that the number of data points required to train a model is the number of weights you have, squared. In a model which has 100 weights (which is a considerably tiny model) you’d need 10,000 data points to get a good result.
Back before the internet, this amount of data was incredibly difficult to come across, computer hardware just wasn’t at the level yet required to store that much data.
This wasn’t the only bottleneck, however, on a similar vein, computers were just way too slow, it’s really only been in the last decade or so that computer hardware has finally been able to match up with the requirements for very large, generalized, accurate neural networks.