narwhals icon indicating copy to clipboard operation
narwhals copied to clipboard

[Enh]: Support Daft DataFrame

Open hongbo-miao opened this issue 9 months ago β€’ 8 comments

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

We use Daft as it is a unified engine for Data Analytics, Engineering & ML/AI and fast: https://delta.io/blog/daft-delta-lake-integration/

Performance chart done by Daft team

Image

Please describe the purpose of the new feature or describe the problem to solve.

It would be great to support Daft DataFrame, thanks! ☺️

Upstream Issues

  • [x] https://github.com/Eventual-Inc/Daft/issues/4031
  • [x] https://github.com/Eventual-Inc/Daft/issues/4032
  • [ ] https://github.com/Eventual-Inc/Daft/issues/4033
  • [x] https://github.com/Eventual-Inc/Daft/issues/4094
  • [ ] https://github.com/Eventual-Inc/Daft/issues/4095
  • [ ] https://github.com/Eventual-Inc/Daft/issues/4096
  • [ ] https://github.com/Eventual-Inc/Daft/issues/4098
  • [ ] https://github.com/Eventual-Inc/Daft/issues/4151
  • [x] https://github.com/Eventual-Inc/Daft/issues/4220

hongbo-miao avatar Mar 16 '25 02:03 hongbo-miao

Hey @hongbo-miao this is definitely in scope. We were waiting for a big refactor to land. Now that it has, we can start working towards supporting Daft, but I have the suspicion that @MarcoGorelli is already been cooking something recently πŸ‘€

FBruzzesi avatar Mar 16 '25 08:03 FBruzzesi

I've ~~got a local branch~~ added (#2223) with a ~~mostly~~ finished CompliantDataFrame (see for other parts #2202, #2119, #2064).

From a quick look at daft.DataFrame it looks like it would be using CompliantLazyFrame. Luckily, it'll be able to match (#2211) via daft.DataFrame.explain

It'll probably be easier to scope out the work after spec-ing CompliantLazyFrame - but that shouldn't take too long.


The API reference is an interesting read. It seems like a mix of polars, pyspark, pyarrow but also some Image operations that seem novel

dangotbanned avatar Mar 16 '25 11:03 dangotbanned

I have the suspicion that @MarcoGorelli is already been cooking something recently πŸ‘€

πŸ˜„ indeed, got something cooking, will update when it's in a more complete state

MarcoGorelli avatar Mar 30 '25 08:03 MarcoGorelli

I have the suspicion that @MarcoGorelli is already been cooking something recently πŸ‘€

πŸ˜„ indeed, got something cooking, will update when it's in a more complete state

@MarcoGorelli you should've mentioned you've got a narwhals label in their repo!

I've linked all the issues you've sneakily opened πŸ˜‰

dangotbanned avatar Apr 05 '25 11:04 dangotbanned

We're getting closer

@dangotbanned 's work made this so much easier - well done Dan, thanks so much, it's hard to overstate how much impact you've had on this project πŸ™Œ

MarcoGorelli avatar May 02 '25 11:05 MarcoGorelli

We're getting closer

@dangotbanned 's work made this so much easier - well done Dan, thanks so much, it's hard to overstate how much impact you've had on this project πŸ™Œ

Thanks @MarcoGorelli, really means a lot!

dangotbanned avatar May 02 '25 12:05 dangotbanned

Hey folks! Few questions here:

  1. Would narwhals use the daft executor directly under the hood? Or translate the daft.DataFrame to an internal representation to perform its operations? I'm mostly interested in how this interfaces with daft on ray
  2. How is the narwhals typing system mapped to daft's typing system? daft support Tensors (including numpy arrays), for example.
  3. Would using narwhals remove daft's ability to attach GPUs to transforms?

NellyWhads avatar May 03 '25 02:05 NellyWhads

Hey @NellyWhads !

The idea is that if you have daft.DataFrame then you can pass that to nw.from_native. Then, any operation you perform using the Narwhals API gets mapped to the daft DataFrame API

Any characteristics of the original object (e.g. what it's connected to, where it runs) should remain unchanged

MarcoGorelli avatar May 03 '25 07:05 MarcoGorelli

Hey @MarcoGorelli πŸ‘‹

Do you have any rough timeline on this or is there anything one could do to support? :) We'd love to replace Dask with Daft through narwhals, if possible in any way 😬 EDIT: Ah oops, only now saw that there already is a WIP plugin available: https://github.com/MarcoGorelli/narwhals-daft

jonded94 avatar Nov 04 '25 12:11 jonded94

Hey @jonded94 !

If you want to try it out, you can do

pip install git+https://github.com/MarcoGorelli/narwhals.git@daft

Here's a little demo, showing tpc-h q1 (which does various aggregations and filters): https://www.kaggle.com/code/marcogorelli/daft-via-narwhals?scriptVersionId=243096444

As for something stable, narwhals-daft is indeed what you're after. We've just finished getting the plugin mechanism ready.

I don't want to make any promises as to when it'll be fully tested and ready, but if anyone's interested in funding the effort please contact [email protected] and we can promise it as a deliverable by some arranged date. It'll probably cost less than what you're expecting and will be well worth it

MarcoGorelli avatar Nov 04 '25 12:11 MarcoGorelli

@jonded94 I got a bit carried away, and narwhals-daft is now published and installable!

https://github.com/narwhals-dev/narwhals-daft

pip install narwhals-daft

and the rest should just work: https://www.kaggle.com/code/marcogorelli/daft-via-narwhals?scriptVersionId=275198341

Curious to hear how you get on with it!

And if anyone fancies helping out with the missing methods, here's the tracking issue: https://github.com/narwhals-dev/narwhals-daft/issues/35 πŸ™


one day https://github.com/ibis-project/ibis/issues/8904 might also happen

MarcoGorelli avatar Nov 10 '25 12:11 MarcoGorelli