course-content icon indicating copy to clipboard operation
course-content copied to clipboard

Dimensionality Reduction T4 - Use of problematic Iris dataset

Open da5nsy opened this issue 4 years ago • 1 comments

W1D5 Dimensionality Reduction Tutorial 4: Part 1 https://youtu.be/2Zb93aOWioM?t=147

https://en.wikipedia.org/wiki/Iris_flower_data_set:

Fisher's paper was published in the journal, the Annals of Eugenics, creating controversy about the continued use of the Iris dataset for teaching statistical techniques today.

https://armchairecology.blog/iris-dataset/

One of the points of the paper (and of the journal, and of Fisher’s leading role in developing biometry and biostatistics) was to propose a methodological framework to delineate desirable traits, in support of eugenics programs. One does not publish in the Annals of Eugenics in 1936 on a misunderstanding.

A penguin-based alternative: https://twitter.com/allison_horst/status/1270046399418138625 https://allisonhorst.github.io/palmerpenguins/articles/pca.html

Palmer Penguins is an R package but there are instructions for using it in Python here: https://towardsdatascience.com/data-analysis-in-python-getting-started-with-pandas-8cbcc1500c83 I understand that pandas is banned here, but I'd be shocked if this hasn't been added into a package that is already used (and if it hasn't, could it be?)

Other non-penguin based alternatives are probably also available.

da5nsy avatar Jul 17 '20 22:07 da5nsy

Penguins is indeed very fun and serves the same pedagogical goals.

mwaskom avatar Jul 17 '20 23:07 mwaskom