narwhals icon indicating copy to clipboard operation
narwhals copied to clipboard

chore: Simplify `PandasLikeDataFrame|DaskLazyFrame.join` method

Open FBruzzesi opened this issue 7 months ago • 5 comments

What type of PR is this? (check all applicable)

  • [x] 💾 Refactor
  • [ ] ✨ Feature
  • [ ] 🐛 Bug Fix
  • [ ] 🔧 Optimization
  • [ ] 📝 Documentation
  • [ ] ✅ Test
  • [ ] 🐳 Other

Related issues

  • Closes #2488

Checklist

  • [x] Code follows style guide (ruff)
  • [ ] Tests added
  • [ ] Documented the changes

If you have comments or can explain your changes, please do so below

Reduces the scope by creating _<join_strategy>_join method, for each join strategy, to be called in join

FBruzzesi avatar May 08 '25 07:05 FBruzzesi

@FBruzzesi would you be open to flipping the method names around?

Most of the private ones that are related share a common prefix. That helps them show up together in autocomplete 🙂

I do understand why e.g (anti_join, left_join) make a lot of sense in isolation, but another way to think about it is:

DataFrame.join(how="anti") -> CompliantDataFrame._join_anti
DataFrame.join(how="left") -> CompliantDataFrame._join_left

Looking at it that way, we're just embedding the how as a specialization of join via a suffix

DaskLazyFrame

image

PandasLikeDataFrame

image

dangotbanned avatar May 09 '25 10:05 dangotbanned

@dangotbanned moved naming to _join_<method>

FBruzzesi avatar May 10 '25 07:05 FBruzzesi

@dangotbanned moved naming to _join_<method>

Thanks @FBruzzesi will try to do a proper review today 🙏

dangotbanned avatar May 10 '25 10:05 dangotbanned

Thanks @FBruzzesi - looking a lot better! 🥳

I've got a few suggestions on the pandas one, but I think the same would apply for dask as well

Thanks @dangotbanned - great work you've done! I will have time later today to go through each point

FBruzzesi avatar May 10 '25 17:05 FBruzzesi

I was thinking about simplifying BaseFrame.join_asof in a similar way to (https://github.com/narwhals-dev/narwhals/pull/2511/commits/1c1daae80ba81597ce117fa8e055d0053354991a)

I thought I'd take a look at the polars version(s).

They're simpler, and accept more parameters and types 🤔

  • https://github.com/pola-rs/polars/blob/34b5729cec09a8e3619c70844a77830c0613f070/py-polars/polars/dataframe/frame.py#L7614-L7630
  • https://github.com/pola-rs/polars/blob/34b5729cec09a8e3619c70844a77830c0613f070/py-polars/polars/lazyframe/frame.py#L5283-L5319

Maybe they simplified it at some point?

dangotbanned avatar May 10 '25 18:05 dangotbanned

https://github.com/narwhals-dev/narwhals/pull/2511#event-18343540555

@FBruzzesi you may have noticed I remembered about this PR in (https://github.com/narwhals-dev/narwhals/pull/2511#ref-pullrequest-3165108714) and then quickly lost track 😅

Feel free to ping me if I forget to review tomorrow!

dangotbanned avatar Jun 26 '25 20:06 dangotbanned

Mostly finished on the review, just need to add a comment

Long awaited review will be with you shortly 😉

dangotbanned avatar Jun 28 '25 11:06 dangotbanned

Thanks for all the support @dangotbanned I (should) have addressed all your points. Only discussion left is:

Regarding (https://github.com/narwhals-dev/narwhals/pull/2511#discussion_r2105922818), I still have a preference for my earlier suggestion.

but as of:

But, both are improvements over what we have on main - so feel free to keep that part as-is 🙂

for now yes I would rather keep it as is. My mind has not changed from the previous comment on it 😂

FBruzzesi avatar Jun 28 '25 13:06 FBruzzesi