datumaro
datumaro copied to clipboard
Support packed data sources
Datasets often come as zip or tar archives, it would be nice to support such kind of inputs in Datumaro. It can be undesirable to unpack them, because this can have a major impact on storage.
Requirements:
- Be able to read a dataset from zip, tar, etc.
- Avoid full unpacking, unless requested
- Support recursive chains of archives like
tar.gz
,((file.zip).zip).7z