Jay Chia

Results 126 comments of Jay Chia

Indeeeeed.... @universalmind303 already built some basic support: https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/huggingface.html I think there are still many things we can work on here (e.g. if HF can expose an S3 protocol instead, Daft...

Other workloads we're actively working on include Huggingface's Datatrove https://github.com/huggingface/datatrove -- we've been running some really large deduplication/batch inference pipelines with Daft Could be really cool getting that working e2e...

Some open questions on our end which we can discuss: 1. Can HF expose S3 protocols? That would greatly improve performance, potentially unlock writes, and also leverage a lot of...

Assigning to @ykdojo as a first task!

What do you think instead of: ```python volume = unity.load_volume("unity.default.images") df = daft.from_unity_catalog_volume(volume) # Shows references, file sizes etc df.show() df_img = df.with_column("image_data_bytes", df["references"].unity_catalog.download(volume)) df_img = df.with_column("image", df["image_data_bytes"].image.decode()) ```

Thanks, I have some ideas around getting us movement around this. Will get us some movement this week.

You should be able to rebase locally and do a force push! Also thanks for the PR -- is Daft not already installed in editable mode for local development with...

To try the native runner (do not initialize Ray for this): ```python # This is the default behavior if Ray is not initialized daft.set_runner_native() # Manually set the morsel size...

Hey there, curious as to how this should be integrated? What API does Celeborn expose that might be convenient for a Rust-based engine such as Daft?

> [@jaychia](https://github.com/jaychia), thanks for replying this ticket. The implementation design of the integration is that support celeborn Rust client to provide daft with Rust SDK. Celeborn provides the REST API...