litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Question: is there a plan to support streaming from GCS?

Open dnnspark opened this issue 1 year ago • 6 comments
trafficstars

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

dnnspark avatar Apr 13 '24 01:04 dnnspark

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar Apr 13 '24 01:04 github-actions[bot]

Hey @dnnspark,

Yes, this is quite simple to add. Simply needs to add the downloader.

tchaton avatar Apr 14 '24 18:04 tchaton

Thanks @tchaton, do you have an idea when it's going to land (even very rough estimate)?

dnnspark avatar Apr 14 '24 23:04 dnnspark

Hey @dnnspark,

If you are willing to give it a try, I can look into it this week.

tchaton avatar Apr 15 '24 08:04 tchaton

Sorry for the late @tchaton

I'm willing to try! But it's not blocking at the moment, so I will stay tuned about the GCS support (it will be very helpful if you ping on this thread once it's ready).

One thing I notice is that optimize() function assumes the data is stored on local disk (at least in the example). In my case, the raw data is at GCS (because it's too large). Is there a way to transform the data that is stored in the cloud, and save the transformed data to the cloud without having to download the entire data?

dnnspark avatar Apr 16 '24 21:04 dnnspark

Sorry for the late @tchaton

I'm willing to try! But it's not blocking at the moment, so I will stay tuned about the GCS support (it will be very helpful if you ping on this thread once it's ready).

One thing I notice is that optimize() function assumes the data is stored on local disk (at least in the example). In my case, the raw data is at GCS (because it's too large). Is there a way to transform the data that is stored in the cloud, and save the transformed data to the cloud without having to download the entire data?

Yes. that's why this library was built :) But I would need to add GCS support for it ;) I will try to prioritize it.

tchaton avatar Apr 17 '24 06:04 tchaton

@dnnspark GCS support was merged.

tchaton avatar Jun 04 '24 06:06 tchaton