polars
polars copied to clipboard
docs(python): More accurate and helpful docs for user defined functions
Fixes #14699
- Focuses purely on Python, as in the title of the article, with Rust APIs moved elsewhere.
- Doesn't give erroneous information about the limits of
map_batches(), given implementation changes. - Doesn't mention
map_elements(), for simplicity's sake. This can eventually be added in a new section, presuming I can figure out whymap_elements()even exists. - Adds demonstration of how to write fast user functions.
- Probably other changes I've forgotten.
Potential problems introduced by this PR:
- Numba sometimes takes a while to release support for a new Python major release. So documentation will have to be built with older version of Python for first few months.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 81.41%. Comparing base (
332e40a) to head (ca8ba41).
:exclamation: Current head ca8ba41 differs from pull request most recent head 0778a44
Please upload reports for the commit 0778a44 to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #15194 +/- ##
==========================================
+ Coverage 80.81% 81.41% +0.59%
==========================================
Files 1464 1425 -39
Lines 192019 187964 -4055
Branches 2743 2704 -39
==========================================
- Hits 155185 153027 -2158
+ Misses 36323 34440 -1883
+ Partials 511 497 -14
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for working on this!
Doesn't mention map_elements(), for simplicity's sake. This can eventually be added in a new section, presuming I can figure out why map_elements() even exists.
The purpose of map_elements is for elementwise UDFs. If you're calling a 3rd party library (e.g. libpostal) which works on one element at a time, then that's a good use case for map_elements. It's a last-resort, really, but I think it does need mentioning for those cases where nothing else will do. Polars can also (sometimes) warn the user when they could have done better :sunglasses:
Regarding Numba: given that is' quite dangerous to use it when the Series has missing values, I don't think it should currently be recommended in the user guide. I'd suggest either keeping it out for now, or sorting out Numba first
Merged forward, ready for review again @MarcoGorelli.
Thank you, I will address these.
@MarcoGorelli back to you, either addressed or commented above.
@MarcoGorelli I merged forward to deal with conflicts.
@MarcoGorelli can we get this merged?
Can you do a rebase?
Will do that now.
My editor is having Issues so still need to manually verify that merge was correct.
OK I think it's good now. Note that auto-merge won't happen because I edited something, you'll need re-enable it if you're happy.