etna icon indicating copy to clipboard operation
etna copied to clipboard

`Dataset` in `etna.datasets`

Open martins0n opened this issue 3 years ago • 0 comments

🚀 Feature Request

class Dataset:
    train: TSDataset
    test: TSDataset
    dataset_path: str # url or repositary url 
    freq: str
    known_future: Optional[List[Feature]]
    unknown_future: Optional[List[Feature]]
    cache_path: Optional[pathlib.Path]
    metadata: dict
    tags: List[str]
  
   @property
   def train(self) -> TSDataset:
        pass

   @property
   def test(self) -> TSDataset:
        pass

class Feature:
   # N.B. its just possible fields -- we shoud use dict instead of classes
   name: str
   type: Union[Literal["categorical"], Literal["numeric"], Literal["str"]]

def m5_generation() -> Dataset:
   # code for preprocessing of local saved dataset

def load_dataset(name: str) -> Dataset:
   # calling function for generation Dataset

  • [ ] Index folder with structure:
etna/datasets/index/m4_monthly.json
...
etna/datasets/index/m5.json

With json configuration for TSDataset creation it could contains specials params for data generation, urls, fields for Dataset init

  • [ ] We have helper function which produce Dataset using information in json.

  • [ ] caching datasets in jsonl #467 format (Optional, in the first itreation we can download dataset everytime)

  • [ ] we should add current code generated datasets and m5

Motivation

Proposal

Test cases

No response

Alternatives

No response

Additional context

#467

Checklist

  • [ ] I discussed this issue with ETNA Team

martins0n avatar Feb 01 '22 15:02 martins0n