podium
podium copied to clipboard
Add info attribute to datasets
As discussed via Slack, it's a good idea to add some metadata to our datasets. This idea is partially inspired by HF datasets.
To do so, we will define a new DatasetInfo dataclass like this:
@dataclass
class DatasetInfo:
"""..."""
citation: str
description: str
homepage: Optional[str] = None
Each dataset will have a new class-level info (or metadata) attribute to store this data. In DatasetABC, this attribute will be set to None.
Feel free to comment (e.g. if you think we should include some other metadata). cc @mttk @ivansmokovic @FilipBolt
Some other metadata to potentially add: size in MB, row number (if row-like dataset). Citation could be optional as not all datasets have a published paper.