etna
etna copied to clipboard
Public datasets in `etna.datasets`
🚀 Feature Request
We have synthetic datasets now. It's helpfull for model sanity checks and testing. But real data is more complex.
Motivation
- Performance testing of frequently used datasets in papers.
- Examples enhancement
- Benchmarking in unified framework
Proposal
- Add
load_dataset(dataset_name: str, ..., *args, **kwargs) -> Dataset
-
Dataset
class Dataset:
train: Lazy[TSDataset]
test: Lazy[TSDataset]
dataset_path: Optional[pathlib.Path]
metadata: dict
- M4 ( every seasonalities are independent ), M5, Household Electric Power Consumption, M3, Web Traffic Time Series Forecasting -- datasets to add.
- Format for saving on disk -- jsonl. Every line is
{"target": array<float>, "segment": str, "horizon": int, "freq": int ...}
- Data is loaded from direct link
Test cases
No response
Alternatives
No response
Additional context
- Before publishing-copy data, we should clarify possible licence issues.
Checklist
- [ ] I discussed this issue with ETNA Team