machine-learning-articles icon indicating copy to clipboard operation
machine-learning-articles copied to clipboard

Good coding practices for Data Science

Open khuyentran1401 opened this issue 4 years ago • 0 comments

TL;DR

An efficient workflow for data science

Article Link

https://towardsdatascience.com/good-coding-practices-for-data-science-e9237783784c

Author

Key Takeaways

Code organization

  • Specification Files: Files to specify various parameters for the code (YAML or JSON). Benefit: use the code in different ways with no code changes
  • Utilities: Save the files that are reproducible and generic for future projects.
  • Core Functionality: Separate the pipeline of your project into different files (data extraction, data exploration, data engineering, modeling). Benefit: Easy to change and manipulate the file without running the entire code. Organize your projects for easy reviewing
  • Main Executable: main.py for execute the entire code. Should be short for someone else to understand how pieces of files are integrated together

Documentation

Maintain a Readme page for keeping track of the code changes. Useful for others to look at your code and understand how to use it.

Commenting

Comment on the top of every file for you to organize and for reader to understand the function of the files

Version Control

Benefits: collaborations, can switch back to the older version. Useful for experimenting, editing, and comparing different versions

Automated testing

Use unittest to validate the functionality of different parts of the code

Useful Code Snippets

Useful Tools

Comments/ Questions

  • Knowing these helpful techniques, we should gradually adopt these practices for efficient project management
  • Things that we could add into this workflow:
  1. Mectrics and logging to keep track of metrics and data with MlFlow
  2. A tool to easily create a comprehensible config with Hydra.cc
  3. If the workflow of one project seems to be efficient to us, we can create a template with Cookiecutter

khuyentran1401 avatar Apr 10 '20 22:04 khuyentran1401