Posts

Showing posts with the label Fitting

Overfitting and underfitting

Image
The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line. In statistics,  overfitting  is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably". An  overfitted model  is a  statistical model  that contains more  parameters  than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the  noise ) as if that variation represented underlying model structure. Underfitting  occurs when a statistical model cannot adequately capture the underlying structure of the data. An  under-fitted model  is a...