daskperiment icon indicating copy to clipboard operation
daskperiment copied to clipboard

Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

daskperiment

.. image:: https://img.shields.io/pypi/v/daskperiment.svg :target: https://pypi.python.org/pypi/daskperiment/ .. image:: https://readthedocs.org/projects/daskperiment/badge/?version=latest :target: http://daskperiment.readthedocs.org/en/latest/ :alt: Latest Docs .. image:: https://travis-ci.org/sinhrks/daskperiment.svg?branch=master :target: https://travis-ci.org/sinhrks/daskperiment .. image:: https://codecov.io/gh/sinhrks/daskperiment/branch/master/graph/badge.svg :target: https://codecov.io/gh/sinhrks/daskperiment

Overview


`daskperiment` is a tool to perform reproducible machine learning experiment.
It allows users to define and manage the history of trials
(given parameters, results and execution environment).

The package is built on `Dask`, a package for parallel computing with task
scheduling. Each experiment trial is internally expressed as `Dask` computation
graph, and can be executed in parallel.

Benefits
  • Compatibility with standard Python/Jupyter environment (and optionally with standard KVS).

    • No need to set up server applications
    • No need to registrate on any cloud services
    • Run on standard / customized Python shells
  • Intuitive user interface

    • Few modifications on existing codes are needed
    • Trial histories are logged automatically (no need to write additional codes for logging)
    • Dask compatible API
    • Easily accessible experiments history (with pandas basic operations)
    • Less managiment works on Git (no need to make branch per trials)
    • (Experimental) Web dashboard to manage trial history
  • Traceability of experiment related information

    • Trial result and its (hyper) parameters.

    • Code contexts

    • Environment information

      • Device information
      • OS information
      • Python version
      • Installed Python packages and its version
      • Git information
  • Reproducibility

    • Check function purity (each step should return the same output for the same inputs)
    • Automatic random seeding
  • Auto saving and loading of previous experiment history

  • Parallel execution of experiment steps

  • Experiment sharing

    • Redis backend
    • MongoDB backend

Future Scope


- More efficient execution.

  - Omit execution if depending parameters are the same
  - Distributed execution