croissant icon indicating copy to clipboard operation
croissant copied to clipboard

[NEURIPS] Hosted Editor doesn't allow nested Fields in RecordSets

Open francois-rd opened this issue 1 year ago • 3 comments

Using the editor app hosted on HuggingFace (https://huggingface.co/spaces/MLCommons/croissant-editor), I'm trying to add a RecordSet to represent a nested JSON structure.

The format specification (https://docs.mlcommons.org/croissant/docs/croissant-spec.html#recordsets) seems to suggest that nested fields are possible, but the editor does not seem to support a nested data type (see image).

I was thinking about using a 'join' to another record set to build nested data, but my understanding is that 'join' is meant to cross-link files. My dataset contains standalone (not cross-linked) files each containing a series of nested JSON structures (one per instances), altogether in a JSON Lines format.

Screen Shot 2024-06-05 at 11 16 30

francois-rd avatar Jun 05 '24 15:06 francois-rd

I got the same issue.

super-dainiu avatar Jun 05 '24 16:06 super-dainiu

Indeed the Croissant editor does not support nested fields yet.

You can export the json-ld for your dataset, and add them manually.

The mlcroissant python library can be used to validate your Croissant file.

benjelloun avatar Jun 05 '24 17:06 benjelloun

Are there any plans to expand the capabilities of the editor? My dataset has a fairly complex structure and the prospect of having to manually create a file with several hundred lines of esoteric machine-friendly metadata is daunting to say the least...

francois-rd avatar Jun 05 '24 17:06 francois-rd