Implemented NumbaExecutionEngine
- [ ] ~closes #xxxx (Replace xxxx with the GitHub issue number)~
- [x] Tests added and passed if fixing a bug or adding a new feature
- [x] All code checks passed.
- [ ] ~Added type annotations to new arguments/methods/functions.~
- [x] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.
Implements NumbaExecutionEngine for #61458
Docstring is currently a placeholder.
Hi @datapythonista, I was seeing a CI error because Numba isn’t installed in the test environment, so I tried to guard against it using a try and catch method, but it seems not to work. Do you have any advice on how to move forward?
This is what we use for optional dependencies in tests: https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/test_iceberg.py#L21
Thanks for the clarification in the comment @datapythonista
Just to clarify, for the NumbaExecutionEngine, this means that instead of having the condition if engine == "numba" inside of apply_raw, we should be calling NumbaExecutionEngine.apply directly in Dataframe.apply and let the engine handle all the unsupported cases by Numba.
This would involve:
- removing all instances of
if engine == "numba"in apply.py and moving them intoNumbaExecutionEngine.apply - calling
NumbaExecutionEngine.applydirectly inDataframe.applyinstead ofapply_raw
Correct
Sorry, I added some comments before, but I just realized now I didn't submit the review with them. I explained there the idea on how to call NumbaExecutionEngine from DataFrame.apply
Hey @datapythonista, the current Numba implementation for apply when raw=false is written inside FrameApply e.g., apply_series_numba()), and relies on internal state (self.obj, self.axis, etc.).
As part of moving toward the new engine interface, should we explicitly rewrite this logic inside NumbaExecutionEngine, passing in the required values (like data, func, axis, etc.)?
As part of moving toward the new engine interface, should we explicitly rewrite this logic inside NumbaExecutionEngine, passing in the required values (like data, func, axis, etc.)?
Yes, the new interface already receives that information as parameters
Hi @datapythonista, I’ve finished the NumbaExecutionEngine implementation for apply and removed all instances of numba from FrameApply. All tests are passing. Let me know if there’s anything else you'd like me to follow up on!
Regarding your comment here. I think it would make sense to remove all references to engine and engine_kwargs from FrameApply since it only handles the python engine. What do you think?
It makes sense, I think both can be removed. Thanks!
Hey @datapythonista, are there any blockers preventing this from being merged? Happy to help with any changes if needed!
Hi @mroeschke, just following up on this. Let me know if you'd like any changes, otherwise I think it's good to go.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.