croissant
croissant copied to clipboard
Partition support
There is already a howto about splits (https://github.com/mlcommons/croissant/blob/main/docs/howto/specify-splits.md) and an example (https://github.com/mlcommons/croissant/blob/main/datasets/coco2014/metadata.json).
However we also want support for other types of partitions, namely dated partitions and languages (eg: wikipedia).
Currently there is no support for partitions in the validator / loader. We should make sure it is possible to retrieve a single (or a few) partition(s) and only download the required files. We should also make sure it is possible to retrieve many partitions (not just one language for example).
There is no existing howto page for partitions, but I think we need one.