datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Maybe add titanic data set

Open markarios opened this issue 7 years ago • 3 comments

Could be used in an hello world ml type demo.

markarios avatar Jan 25 '18 07:01 markarios

If there's interest in adding the titanic data set, maybe add all the data sets that come standard with "r" and "scikit-learn"?

r => https://vincentarelbundock.github.io/Rdatasets/datasets.html scikit-learn => http://scikit-learn.org/stable/datasets/#

sebg avatar Feb 12 '18 05:02 sebg

Sure, interesting idea. Looking around I see:

  • http://www.public.iastate.edu/~hofmann/data/titanic.html - Class, Age, Sex, Survived, string values, 2201 lines
  • https://ww2.amstat.org/publications/jse/v3n3/datasets.dawson.html - Class, Age, Sex, Survived, integer values, 2201 lines
  • http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html survived, age, passenger-class, sex, fare - 887 lines
  • https://www.kaggle.com/c/titanic/data - PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked - 891 lines

My guess is that folks are most interested in the Kaggle version of this data, not the R version or any of the versions with ~2k passengers but fewer fields?

tmcw avatar Mar 27 '18 23:03 tmcw

Instead of being the Decider on this one, I decided to write a notebook, rdatasets, that provides access to the titanic datasets, as well as many others, and some documentation to go along with.

tmcw avatar Nov 13 '18 20:11 tmcw