cookiecutter-data-science icon indicating copy to clipboard operation
cookiecutter-data-science copied to clipboard

Add starter Data Science Project

Open dmitrypolo opened this issue 7 years ago • 9 comments

One of the other things that @isms mentioned was the ability to include a starter project for users who are just getting started out. There is now an option for that, that was added in the cookiecutter.json. When creating a new project it will prompt you for that option. If you decide to include the starter project it will have code in the pre-defined locations, specifically the src/data and src/models directories. The Makefile also includes an option to make the whole data pipeline from end to end to be more robust. This will grab data from a URL, train it, split it, fit the data to a model, and pickle the model. Lastly it makes predictions and displays them to the user. Included in the Data Science starter project is also unit tests which revolve around testing the actual logic of the functions to give a beginner an idea of how to write unit tests involving patching and fixtures. Similarly if a user decides to opt out of the starter project all those files with code are emptied out and returned blank. There is code in the hooks directory which accomplishes this post gen. Please let me know if you have any questions or feedback, thanks!

dmitrypolo avatar Jul 28 '18 16:07 dmitrypolo

@dmitrypolo Any objection to swapping out the data set for a different one?

isms avatar Jul 31 '18 01:07 isms

@isms no objections, do you have anything specific in mind?

dmitrypolo avatar Jul 31 '18 13:07 dmitrypolo

Since iris is pretty played out, how about the UC Irvine ML blood donations? It mirrors our blood donations competition.

isms avatar Jul 31 '18 16:07 isms

I will reconvene with the team and get back to you shortly, thanks!

dmitrypolo avatar Jul 31 '18 21:07 dmitrypolo

@dmitrypolo @johnkarlen Is it this one or #135 that we should be looking at? Would love to get this merged this week!

isms avatar Aug 06 '18 15:08 isms

@isms this one, I will adjust some stuff based on your other comments, and finish modifying tests for the new dataset, will get back to shortly

dmitrypolo avatar Aug 06 '18 18:08 dmitrypolo

@isms please review and let me know

dmitrypolo avatar Aug 07 '18 01:08 dmitrypolo

This is really useful and should be included, very useful to see how the template is intended to be used

mattarderne avatar Jan 10 '20 14:01 mattarderne

This is really useful and should be included, very useful to see how the template is intended to be used

Agreed. I heavily referenced this PR ~2 years ago to figure out best practices for using the CC template... It would still be super useful to have it merged.

joel-aws avatar Jul 21 '20 19:07 joel-aws