polars icon indicating copy to clipboard operation
polars copied to clipboard

docs(python): More accurate and helpful docs for user defined functions

Open itamarst opened this issue 1 year ago • 1 comments
trafficstars

Fixes #14699

  1. Focuses purely on Python, as in the title of the article, with Rust APIs moved elsewhere.
  2. Doesn't give erroneous information about the limits of map_batches(), given implementation changes.
  3. Doesn't mention map_elements(), for simplicity's sake. This can eventually be added in a new section, presuming I can figure out why map_elements() even exists.
  4. Adds demonstration of how to write fast user functions.
  5. Probably other changes I've forgotten.

Potential problems introduced by this PR:

  • Numba sometimes takes a while to release support for a new Python major release. So documentation will have to be built with older version of Python for first few months.

itamarst avatar Mar 20 '24 18:03 itamarst

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.41%. Comparing base (332e40a) to head (ca8ba41).

:exclamation: Current head ca8ba41 differs from pull request most recent head 0778a44

Please upload reports for the commit 0778a44 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15194      +/-   ##
==========================================
+ Coverage   80.81%   81.41%   +0.59%     
==========================================
  Files        1464     1425      -39     
  Lines      192019   187964    -4055     
  Branches     2743     2704      -39     
==========================================
- Hits       155185   153027    -2158     
+ Misses      36323    34440    -1883     
+ Partials      511      497      -14     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 20 '24 19:03 codecov[bot]

Thanks for working on this!

Doesn't mention map_elements(), for simplicity's sake. This can eventually be added in a new section, presuming I can figure out why map_elements() even exists.

The purpose of map_elements is for elementwise UDFs. If you're calling a 3rd party library (e.g. libpostal) which works on one element at a time, then that's a good use case for map_elements. It's a last-resort, really, but I think it does need mentioning for those cases where nothing else will do. Polars can also (sometimes) warn the user when they could have done better :sunglasses:

Regarding Numba: given that is' quite dangerous to use it when the Series has missing values, I don't think it should currently be recommended in the user guide. I'd suggest either keeping it out for now, or sorting out Numba first

MarcoGorelli avatar Apr 07 '24 11:04 MarcoGorelli

Merged forward, ready for review again @MarcoGorelli.

itamarst avatar May 29 '24 14:05 itamarst

Thank you, I will address these.

itamarst avatar May 31 '24 18:05 itamarst

@MarcoGorelli back to you, either addressed or commented above.

itamarst avatar Jun 05 '24 15:06 itamarst

@MarcoGorelli I merged forward to deal with conflicts.

itamarst avatar Jun 11 '24 17:06 itamarst

@MarcoGorelli can we get this merged?

itamarst avatar Jun 14 '24 14:06 itamarst

Can you do a rebase?

ritchie46 avatar Jun 25 '24 15:06 ritchie46

Will do that now.

itamarst avatar Jun 25 '24 16:06 itamarst

My editor is having Issues so still need to manually verify that merge was correct.

itamarst avatar Jun 25 '24 17:06 itamarst

OK I think it's good now. Note that auto-merge won't happen because I edited something, you'll need re-enable it if you're happy.

itamarst avatar Jun 25 '24 17:06 itamarst