pandas icon indicating copy to clipboard operation
pandas copied to clipboard

Implemented NumbaExecutionEngine

Open arthurlw opened this issue 6 months ago • 3 comments

  • [ ] ~closes #xxxx (Replace xxxx with the GitHub issue number)~
  • [x] Tests added and passed if fixing a bug or adding a new feature
  • [x] All code checks passed.
  • [ ] ~Added type annotations to new arguments/methods/functions.~
  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Implements NumbaExecutionEngine for #61458

Docstring is currently a placeholder.

arthurlw avatar May 23 '25 23:05 arthurlw

Hi @datapythonista, I was seeing a CI error because Numba isn’t installed in the test environment, so I tried to guard against it using a try and catch method, but it seems not to work. Do you have any advice on how to move forward?

arthurlw avatar Jun 03 '25 15:06 arthurlw

This is what we use for optional dependencies in tests: https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/test_iceberg.py#L21

datapythonista avatar Jun 03 '25 18:06 datapythonista

Thanks for the clarification in the comment @datapythonista

Just to clarify, for the NumbaExecutionEngine, this means that instead of having the condition if engine == "numba" inside of apply_raw, we should be calling NumbaExecutionEngine.apply directly in Dataframe.apply and let the engine handle all the unsupported cases by Numba.

This would involve:

  1. removing all instances of if engine == "numba" in apply.py and moving them into NumbaExecutionEngine.apply
  2. calling NumbaExecutionEngine.apply directly in Dataframe.apply instead of apply_raw

arthurlw avatar Jun 16 '25 03:06 arthurlw

Correct

datapythonista avatar Jun 16 '25 09:06 datapythonista

Sorry, I added some comments before, but I just realized now I didn't submit the review with them. I explained there the idea on how to call NumbaExecutionEngine from DataFrame.apply

datapythonista avatar Jun 16 '25 09:06 datapythonista

Hey @datapythonista, the current Numba implementation for apply when raw=false is written inside FrameApply e.g., apply_series_numba()), and relies on internal state (self.obj, self.axis, etc.).

As part of moving toward the new engine interface, should we explicitly rewrite this logic inside NumbaExecutionEngine, passing in the required values (like data, func, axis, etc.)?

arthurlw avatar Jun 18 '25 04:06 arthurlw

As part of moving toward the new engine interface, should we explicitly rewrite this logic inside NumbaExecutionEngine, passing in the required values (like data, func, axis, etc.)?

Yes, the new interface already receives that information as parameters

datapythonista avatar Jun 18 '25 07:06 datapythonista

Hi @datapythonista, I’ve finished the NumbaExecutionEngine implementation for apply and removed all instances of numba from FrameApply. All tests are passing. Let me know if there’s anything else you'd like me to follow up on!

arthurlw avatar Jun 20 '25 07:06 arthurlw

Regarding your comment here. I think it would make sense to remove all references to engine and engine_kwargs from FrameApply since it only handles the python engine. What do you think?

arthurlw avatar Jun 23 '25 14:06 arthurlw

It makes sense, I think both can be removed. Thanks!

datapythonista avatar Jun 23 '25 15:06 datapythonista

Hey @datapythonista, are there any blockers preventing this from being merged? Happy to help with any changes if needed!

arthurlw avatar Jul 03 '25 18:07 arthurlw

Hi @mroeschke, just following up on this. Let me know if you'd like any changes, otherwise I think it's good to go.

arthurlw avatar Jul 31 '25 03:07 arthurlw

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

github-actions[bot] avatar Oct 07 '25 00:10 github-actions[bot]