d2l-en
d2l-en copied to clipboard
Overfitting on high order polynomial function
This is not really an issue, just a comment on the example of high order polynomial function illustrating overfitting (4.4 Model Selection, Underfitting, and Overfitting): When I repeated running the example, most of times the losses from training and test data agree very well, only one or two out of 10 showed overfitting. Also, if I used learning rate lr=0.1 both almost fit perfectly, no overfitting at all were seen.
The behavior of a model can indeed vary based on several factors, including the specific dataset, hyperparameters, and the random initialization of the model weights. The example you mentioned likely demonstrates the sensitivity of the model's behavior to hyperparameters like the learning rate.
Overfitting occurs when a model captures noise in the training data rather than the underlying patterns, and it often depends on the complexity of the model and the amount of data available. By adjusting the learning rate, you may have found a setting that reduces the likelihood of overfitting in that particular case.
It's important to note that overfitting is just one possible issue in machine learning, and the specific characteristics of a dataset and model may lead to different behaviors. It's common to experiment with hyperparameters and model architectures to find the best settings for a given task. The example you mentioned highlights the importance of hyperparameter tuning and the need to adapt model training to the specific problem at hand.
Machine learning is an empirical field, and your observations demonstrate the variability that can be encountered when working with different datasets and hyperparameter settings. It's always a good practice to thoroughly test and validate models on various datasets and under different conditions to ensure their robustness and generalization performance.