tales-science-data
tales-science-data copied to clipboard
bias and variance
- refer into random forest
Also learning curves for diagnosing bias and variance https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/6cba692b-290d-4c7b-93c9-04c3b6cdd96b/Ng_MLY06.pdf
And also see the examples of bias and variance in the same document there, https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/db5cc9c4-1964-420f-bce6-24835a2aa097/Ng_MLY01_05.pdf
And this good page https://www.dataquest.io/blog/learning-curves-machine-learning/
See Bishop for the decomposition of the error into noise, bias and variance - or wikpedia
from an old notebook I had on this here:
From NG
Error to be decomposed into the two things. Bias is the error on the training set, variance is difference errors test/training sets. Tot error is the sum
TODO find more formal def
You can reduce either of them, or both (but much harder)
- high variance/lowe bias means does not generalise - overfits
- high bias/low variance - underfits
- high both - overfitting and underfitting
- low bias low variance it's great
some bias may be unavoidable - the unavoidable bias is the optimal error rate, also called Bayes error rate; can be estimated making a human do the task, harder if it's a task even a human has no idea
variance can be reduced by having more training data, there is no unavoidable variance variance also reduced by regularization but might increase bias bias can be reduced with a more complex model (mind this can result in increased variance though, also costs more in computation)
Tradeoff
can't easily reduce both at the same time
reduce (avoidable) bias
Adding training data doesn't help
- increase model complexity/size
- with feat engineering designed to reduce it, or add feats
- reduce/remove regularization (will increase variance)
reduce variance
- add training data
- add regularization (increaes bias)
- add early stopping (falls in the category of regularization, and also increases bias)
- feat selection (might increase bias), less used in DL, which will decide which feats to use, but more used when data is small
- decrease model size/complexity (might increase bias) - better to use regularisation though
Learning curves
test error against training size - should decrease. If plateaus it tells adding data won't improve it anymore. Also plot training error - should increase with training size (mislabels, ambiguities...). You can get an idea of whethere adding more training data would help by the trend of these two curves. Variance is the difference between these curves, if they're far apart means adding more data may help reducing it
From Scott
Gives a conceptual and graphical (bulls-eye, very interesting, to reproduce here)
Also gives a math definition
gives an example (voting intention) that shows the concept of bias and variance (to reproduce/create a similar one),
gives an interactive example with kNN, also used for that page
gives some (kinda rigorous) suggestions for TODOs to deal with the tradeoff
References
- A Ng, Machine Learning Yearning, draft available here
- S Fortmann-Roe, Understanding the Bias-Variance Tradeoff