litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Question: Is there a list for publicly available s3 links of datasets of `litdata.StreamingDataset` format?

Open 2catycm opened this issue 11 months ago • 3 comments

If there is a list that collects some popular datasets that have been preprocessed by litdata and upload to lightning studio or S3, then the usability of this project will be really awesome for me.

For example, is there a streaming dataset for imagenet that is publicly available?

2catycm avatar Dec 02 '24 08:12 2catycm

Hey @2catycm. Yes, there is. I haven't processed much datasets so far.

Here are my published Studios: https://lightning.ai/thomasgridai

The dataset is available under s3://optimized-imagenet-1m/lightning_data_imagenet I think to remember

tchaton avatar Dec 02 '24 09:12 tchaton

Hey @2catycm. We also added support for Hugging Face datasets.

tchaton avatar Feb 09 '25 08:02 tchaton

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '25 05:04 stale[bot]

Closing this issue for now. Please feel free to reopen in case you have any further questions. 😊

bhimrazy avatar Jun 04 '25 07:06 bhimrazy