etna icon indicating copy to clipboard operation
etna copied to clipboard

Public datasets in `etna.datasets`

Open martins0n opened this issue 2 years ago • 0 comments

🚀 Feature Request

We have synthetic datasets now. It's helpfull for model sanity checks and testing. But real data is more complex.

Motivation

  • Performance testing of frequently used datasets in papers.
  • Examples enhancement
  • Benchmarking in unified framework

Proposal

  • Add load_dataset(dataset_name: str, ..., *args, **kwargs) -> Dataset
  • Dataset
class Dataset:
    train: Lazy[TSDataset]
    test: Lazy[TSDataset]
    dataset_path: Optional[pathlib.Path]
    metadata: dict
  • M4 ( every seasonalities are independent ), M5, Household Electric Power Consumption, M3, Web Traffic Time Series Forecasting -- datasets to add.
  • Format for saving on disk -- jsonl. Every line is {"target": array<float>, "segment": str, "horizon": int, "freq": int ...}
  • Data is loaded from direct link

Test cases

No response

Alternatives

No response

Additional context

  • Before publishing-copy data, we should clarify possible licence issues.

Checklist

  • [ ] I discussed this issue with ETNA Team

martins0n avatar Jan 28 '22 12:01 martins0n