etna icon indicating copy to clipboard operation
etna copied to clipboard

[Draft] class `Auto` for automatic optimal model search

Open martins0n opened this issue 1 year ago • 0 comments

N.B. Blocked by #854 , #853

🚀 Feature Request

Create etna.auto.Auto class which supposed to search optimal solution from defined config pool.

  • Config pool could be extended
  • We use optuna for search orchestration
  • optuna could be parallalized via runners
  • objective is a decarator for passing additional arguments for optuna objective.

Workflow:

  • Init Auto with defined parameters
  • start fit with chosen TSDataset, you should specify number of trials or timeout. You could pass initializer to init loggers for example or callback to customize work with backtest results optionaly.
  • you could call stack_best to get stacking ensemble of the best pipelines or just get the best pipeline
  • you could get all results with aggregated statistics via runs_result method

Proposal


class Auto:

  def __init__(
    self,
    metric: Metric,
    metric_aggregation: Literal['mean', 'median'],
    backtest_params: Dict,
    experiment_folder: str,
    horizon: int,
    pool: Optional[Pool, List[Pipeline]] = Pool.default,
    runner: Runner = LocalRunner,
    storage: optuna.BaseStorage = None,
  ):
    pass
  
  def fit(
    self,
    ts: TSDataset,
    timeout: float,
    initializer: Callable,
    callback: Callable,
    **optuna_kwargs,
  ) -> Pipeline:
  
  def stack_best(
    self,
    n_best: 5
  ) -> StackingEnsemble
    pass
  
  def runs_result(self) -> DataFrame:
    # returns: | Pipeline | metrics | path |
    pass
  
  @staticmethod
  def objective(
        ts: TSDataset,
        metric: Metric,
        metric_aggregation: Literal['mean', 'median'],
        backtest_params: dict,
        callback: Optional[Callable] = None,
        initializer: Optional[Callable] = None,
    ) -> Callable[optuna.trial.Trail, float]
       """ Return oputna like objective with bactkest running and calling `initializer`, `callback`  functions. """

Test cases

No response

Additional context


stateDiagram-v2
    fit: Auto.fit
    State2: Optuna start search from defined pool
    State3: pipeline_1
    State4: piepline_n
    State5: Optuna storage
    fit --> State2
    State2 --> State3
    State2 --> State4
    State3 --> State5
    State4 --> State5
    note right of State5
        Metrics and configs for extended analysis
    end note
    stack_best: Auto.stack_best
    runs_result: Auto.runs_result
    note left of  runs_result: returns table with fields | pipeline | metric1 | metric2 ... | path |
    runs_result --> State5
    stack_best --> State5
    note left of stack_best
        returns StackingEnsemble
    end note
    note left of fit
        start greedysearch with optuna with chosen runner
    end note


sequenceDiagram
    [*]->>+Auto: fit()
    Note over Auto,Auto: create Optuna with storage
    Note over Auto,Auto: we use sqlite storage by default
    Auto->>Optuna: tune() 
    Note over Auto,Optuna: run self.Optuna.tune(..., runner=self.runner)
    Optuna->>+Runner: __call__() 
    Note over Optuna,Runner: Optuna.study.optimize is executed in defined Runner enviroment with objective based on backtest
    Runner-->>-Optuna: return None 
    Optuna-->>Auto: return optuna.study
    Auto->>Auto: runs_result()
    Note over Auto,Auto: filter out best result Pipeline
    Auto-->>-[*]: return best Pipeline

martins0n avatar Aug 11 '22 16:08 martins0n