podium icon indicating copy to clipboard operation
podium copied to clipboard

Add info attribute to datasets

Open mariosasko opened this issue 5 years ago • 1 comments

As discussed via Slack, it's a good idea to add some metadata to our datasets. This idea is partially inspired by HF datasets.

To do so, we will define a new DatasetInfo dataclass like this:

@dataclass
class DatasetInfo:
    """..."""
    citation: str
    description: str
    homepage: Optional[str] = None

Each dataset will have a new class-level info (or metadata) attribute to store this data. In DatasetABC, this attribute will be set to None.

Feel free to comment (e.g. if you think we should include some other metadata). cc @mttk @ivansmokovic @FilipBolt

mariosasko avatar Dec 13 '20 17:12 mariosasko

Some other metadata to potentially add: size in MB, row number (if row-like dataset). Citation could be optional as not all datasets have a published paper.

FilipBolt avatar Dec 14 '20 01:12 FilipBolt