course-content
course-content copied to clipboard
Dimensionality Reduction T4 - Use of problematic Iris dataset
W1D5 Dimensionality Reduction Tutorial 4: Part 1 https://youtu.be/2Zb93aOWioM?t=147
https://en.wikipedia.org/wiki/Iris_flower_data_set:
Fisher's paper was published in the journal, the Annals of Eugenics, creating controversy about the continued use of the Iris dataset for teaching statistical techniques today.
https://armchairecology.blog/iris-dataset/
One of the points of the paper (and of the journal, and of Fisher’s leading role in developing biometry and biostatistics) was to propose a methodological framework to delineate desirable traits, in support of eugenics programs. One does not publish in the Annals of Eugenics in 1936 on a misunderstanding.
A penguin-based alternative: https://twitter.com/allison_horst/status/1270046399418138625 https://allisonhorst.github.io/palmerpenguins/articles/pca.html
Palmer Penguins is an R package but there are instructions for using it in Python here: https://towardsdatascience.com/data-analysis-in-python-getting-started-with-pandas-8cbcc1500c83 I understand that pandas is banned here, but I'd be shocked if this hasn't been added into a package that is already used (and if it hasn't, could it be?)
Other non-penguin based alternatives are probably also available.
Penguins is indeed very fun and serves the same pedagogical goals.