tsfresh icon indicating copy to clipboard operation
tsfresh copied to clipboard

RayDistributor for using Ray to distribute the calculations in tsfresh

Open TheaperDeng opened this issue 1 year ago • 1 comments

Ray is getting popular for building distributed applications and easy to fit into tsfresh by a RayDistributor.

Distributed tsfresh on Ray

This repo involves a new RayDistributor for tsfresh to use ray to distribute the calculations.

RayDistributor is a subclass of IterableDistributorBaseClass in tsfresh which follows the developing instruction in https://tsfresh.readthedocs.io/en/latest/text/tsfresh_on_a_cluster.html.

Quick Start

Use RayDistributor the same way as MultiprocessingDistributor, ClusterDaskDistributor or LocalDaskDistributor.

from tsfresh.utilities.distribution import RayDistributor

distributor = RayDistributor(n_workers=4)
# ...
extracted_features = extract_features(..., distributor=distributor)
# ...

Code change summary

  • add RayDistributor definition in tsfresh.utilities.distribution
  • add RayDistributor document in docs/text/tsfresh_on_a_cluster.rst
  • Update pre-commit-config to enable future development
  • Update test-requirements.txt for UT
  • munually test the UT and document generation locally

TheaperDeng avatar Jun 22 '23 09:06 TheaperDeng

@nils-braun It would be great to have some suggestions to avoid changing pre-commit-config version and to the PR itself

TheaperDeng avatar Jun 28 '23 01:06 TheaperDeng