cookiecutter-data-science
cookiecutter-data-science copied to clipboard
Update the opinions section
We should take an opportunity to refresh the "manifesto" (i.e., the documentation website homepage) for V2, since it's been a long time.
https://github.com/drivendata/cookiecutter-data-science/blob/v2/docs/docs/index.md
Some topics to consider:
Tools we reference
- DAGs tools: I removed a few where the links were broken or it looked obviously not in use any more, and added a few more modern options. Are there other changes we want to make?
- Vagrant: Is Vagrant still in enough popular use to mention this?
- Zeppelin: It's still a real Apache project, but is it relevant enough to mention prominently?
Environment management
I don't know that we really recommend virtualenv+virtualenvwrapper enough that it should be prominently presented as a default recommendation. My thoughts are either:
- No default recommendation. This is too fragmented.
- Recommend conda as a default for data science specifically, since in a non-zero number of cases it's useful to have conda's more powerful support for installing non-Python dependencies.
- Recommend venv since it's bundled in the Python standard library.
Environment lockfiles
We recommend pip freeze but this is no longer the best way to do dependency locking. We should probably recommend pip-tools for pip-compile instead. For conda, we should probably recommend conda-lock.
Not sure if it's worth mentioning other Python environment management tools that support lockfiles, like Poetry, PDM, Pipenv.
Thought: Experiment/results tracking, model management
Also: consider adding a DAG diagram for what a data analysis pipeline graph might look like.
Done in #345