datasets
datasets copied to clipboard
Maybe add titanic data set
Could be used in an hello world ml type demo.
If there's interest in adding the titanic data set, maybe add all the data sets that come standard with "r" and "scikit-learn"?
r => https://vincentarelbundock.github.io/Rdatasets/datasets.html scikit-learn => http://scikit-learn.org/stable/datasets/#
Sure, interesting idea. Looking around I see:
- http://www.public.iastate.edu/~hofmann/data/titanic.html -
Class, Age, Sex, Survived
, string values, 2201 lines - https://ww2.amstat.org/publications/jse/v3n3/datasets.dawson.html -
Class, Age, Sex, Survived
, integer values, 2201 lines - http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html
survived, age, passenger-class, sex, fare
- 887 lines - https://www.kaggle.com/c/titanic/data -
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
- 891 lines
My guess is that folks are most interested in the Kaggle version of this data, not the R version or any of the versions with ~2k passengers but fewer fields?
Instead of being the Decider on this one, I decided to write a notebook, rdatasets, that provides access to the titanic datasets, as well as many others, and some documentation to go along with.