FLAML icon indicating copy to clipboard operation
FLAML copied to clipboard

Task based refactor

Open markharley opened this issue 3 years ago • 3 comments

Why are these changes needed?

  • Refactor to facilitate improvement of time-series fitting.
  • Delegate task specific logic to a Task sub-class, simplifying the AutoML class.
  • Introduce a TimeSeriesDataset class for encapsulation of time-series relevant data operations.
  • Introduce a TimeSeriesEstimator parent class to standardise time-series models.

Related issue number

N/A

Checks

  • [X] I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.
  • [X] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
  • [X] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [X] I've made sure all auto checks have passed.

markharley avatar Sep 19 '22 17:09 markharley

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

:x: markharley sign now
You have signed the CLA already but the status is still pending? Let us recheck it.

ghost avatar Sep 19 '22 17:09 ghost

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

gmdiana-hershey avatar Sep 20 '22 20:09 gmdiana-hershey

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

The [forecast] is installed in certain environments: https://github.com/microsoft/FLAML/actions/runs/3084237483/workflow#L57

sonichi avatar Sep 21 '22 15:09 sonichi

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

The holidays are only used in time series forecasting which is optional. Make sure the import statement is invoked only under the time series environment. See https://github.com/microsoft/FLAML/blob/main/flaml/automl.py#L1096 as an example for nlp (another optional environment).

liususan091219 avatar Sep 26 '22 14:09 liususan091219

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

Hey! Thanks for the review @gmdiana-hershey. I've added dataclasses as a Python version dependent install requirement, thanks for spotting it! Holidays turned out not to be used outside of the tests at present so I've removed the unnecessary test in test_forecast

markharley avatar Oct 02 '22 22:10 markharley

Was looking at the failed build and noticed that there were some import issues.

  • Python 3.6 doesn't include data classes, but there is a backport that could be added to the build script (and setup).
  • Also saw that importing holidays fails. That's not part of the [test] requirements in setup.py (see https://github.com/markharley/FLAML/blob/fafd6524c9466d8ab9c9049bb6102c34cd5819ec/setup.py#L91).
  • It sounds like new tests were added for time series tasks? Should the requirements in [ts_forecast] and [forecast] also be included in [test] in that case?

The holidays are only used in time series forecasting which is optional. Make sure the import statement is invoked only under the time series environment. See https://github.com/microsoft/FLAML/blob/main/flaml/automl.py#L1096 as an example for nlp (another optional environment).

Hey! Thanks for the review @liususan091219. I've applied your relative import suggestions. On holidays, it turns out that it wasn't called outside of tests at present and so I've removed the offending test and redundant code paths

markharley avatar Oct 02 '22 22:10 markharley

Would it be more reasonable to put the "nlp" and "time_series" folders into the "automl" folder?

Hey! I think we ended up with this layout to avoid some circular imports, but I'll have another try at refactoring these into the automl subpackage :smile:

markharley avatar Oct 02 '22 22:10 markharley

Some checks failed. I wonder if it will be easier to break the PR down to smaller PRs. For example, the first PR to make is to just create the automl subpackage and move files to corresponding locations. Otherwise this PR would take a long time to merge and you will get conflicts often.

sonichi avatar Oct 02 '22 23:10 sonichi

Thank you @markharley! We indeed need to re-organize the whole structure of the flaml folder considering the need of adding an automl folder. I am attaching a proposal for the new structure (considering all content in flaml, not just the changes involved in this PR).

In this PR, perhaps you can just make .py files about automl and the time_series folder in the right place. We can come up with a plan with the other changes (and perhaps also discuss this proposed structure plan in the maintainer meeting on 10/10).

  • automl
    • nlp
    • time_series automl.py data.py ml.py model.py train_log.py config.py [and other files need to be added in the automl folder]
  • tune
    • searcher
    • scheduler [and existing files in the tune foler]
  • onlineml
  • default

qingyun-wu avatar Oct 03 '22 01:10 qingyun-wu