dime-data-handbook
dime-data-handbook copied to clipboard
Ch5: Feedback
We are thankful for the opportunity to share our feedback as part of the final review (#476) and we appreciate the effort DIME is putting in disseminating these valuable guidelines and resources.
Here are some ideas, especially coming from the angle of the Data Partnership. I'd be more than happy to collaborate.
Ideas
- The chapter focuses on a project's most time-consuming phrase - data preparation - and it offers good recommendations in that regard, considering the intended audience of Stata and R users. However, as datasets and research challenges become complex in nature, even if the majority of concepts on the book are still valid, it opens a new realm. For example, running into performance issues, out-of-memory, running on the cloud or a distributed cluster.
- Probably out of scope, but it would be great to have a section on cloud computational environments and resources, such as JupyterHub, AWS Sagemaker or Google Colab.
- Probably out of scope, but Python is a dispensable part of a modern analytics stack and there are considerations that might be useful when using Python or, more specifically, working on a data science project.
- Probably out of scope, same goes for containerization with Docker.