croissant icon indicating copy to clipboard operation
croissant copied to clipboard

[NEURIPS] `.zip` and `.tar.gz` archives are not supported for file uploading

Open amorehead opened this issue 1 year ago • 5 comments

  • .zip and .tar.gz archives are currently not supported for file uploading. What would be needed to implement this feature?

amorehead avatar May 25 '24 23:05 amorehead

This appears related to https://github.com/mlcommons/croissant/issues/547, so I am closing this issue.

amorehead avatar May 25 '24 23:05 amorehead

I am reopening this issue per the NeurIPS organizer's recommendation.

amorehead avatar Jun 05 '24 15:06 amorehead

Hi, wanted to check what you mean by this. I have a FileObject with a content_url pointing to a publicly available .zip file and I have my encoding_format set to application/zip similar to coco2014 but I'm getting the following error:

ValueError: Unsupported compression method for file: ...

and

GenerationError: An error occurred during the streaming generation of the dataset, more specifically during the operation Extract(training_data).

Is this the same issue you're facing? I'm able to get it working if I don't upload a compressed .zip file.

Edit: I tried updating my content_url to refer to the .zip locally instead and it works perfectly - I'm just not able to get it to work with a content_url that points to a remote .zip file

JovinLeong avatar Jun 10 '24 12:06 JovinLeong

Hi, @JovinLeong. I believe the issue for me is that I'm trying to point to a remote .zip/.tar.gz archive. Good to know that local paths work though!

amorehead avatar Jun 11 '24 00:06 amorehead

Okay, then it seems like we're facing the same issue then - which seems odd since the coco2014 example uses a remote .zip. Though tbf it seems like coco2014 isn't working for me anyway

JovinLeong avatar Jun 11 '24 02:06 JovinLeong