croissant icon indicating copy to clipboard operation
croissant copied to clipboard

[NEURIPS] Hosted Editor hides files with errors (cannot be deleted)

Open francois-rd opened this issue 1 year ago • 4 comments

Using the editor hosted on HuggingFace (https://huggingface.co/spaces/MLCommons/croissant-editor), I first accidentally added a FileObject instead of a FileSet. When I selected FileSet as that FileObjects parent, the editor hid the file (I'm assuming because an error was raised?). I corrected my mistake by adding two FileSets instead. Now, the editor shows that I have 3 resources on the overview tab even though I only two resources show up on the resources tab (see images). Furthermore, the overview tab highlights a number of errors related to that initial FileObject having unfilled fields (see image), and so I cannot export my croissant metadata (export button is not clickable).

Overview tab shows 3 resources: Screen Shot 2024-06-05 at 11 39 18

Resource tab shows only 2 resources (which is what I want): Screen Shot 2024-06-05 at 11 41 09

Overview tab shows errors relating to missing resource file: Screen Shot 2024-06-05 at 11 42 24

francois-rd avatar Jun 05 '24 15:06 francois-rd

Hi Francois,

If your data is in an archive, you should first add a FileObject for the archive file, and then a "child" FileSet with containedIn set to the FileObject.

I would recommend creating a new dataset from scratch in the editor... I'm not sure the one you currently have can be easily fixed.

Hope this helps, Omar

benjelloun avatar Jun 05 '24 17:06 benjelloun

If your data is in an archive, you should first add a FileObject for the archive file, and then a "child" FileSet with containedIn set to the FileObject.

The problem is that the editor doesn't support uploading an archive file: Screen Shot 2024-06-05 at 13 35 03

I would recommend creating a new dataset from scratch in the editor... I'm not sure the one you currently have can be easily fixed.

Fair enough. However, the fact that I have to restart from scratch every time I make even a small mistake makes the editor not user-friendly in the slightest. I should be able to free delete mistake files, rather than the editor hiding them from the UI but still complaining about the errors.

francois-rd avatar Jun 05 '24 17:06 francois-rd

Same issue with tar.xz files, when provide a link and all requied fields, the error message persists.

changliu98 avatar Jun 05 '24 20:06 changliu98

I encounter similar issues, where the difference is that even when I upload unarchived .csv files, the editor still shows errors like "At least one of these properties should be defined: ['md5', 'sha256'].", "Property "https://schema.org/contentUrl" is mandatory, but does not exist.", "Property "https://schema.org/encodingFormat" is mandatory, but does not exist.", Node "xxx" is a field and has no source. Please, use http://mlcommons.org/croissant/source to specify the source.

Especially for the "no source" issue, the croissant documentation https://mlcommons.org/croissant/source does not exist.

XenonLamb avatar Jun 11 '24 22:06 XenonLamb