[FEA] Make the cudf.pandas profiler show time taken by other, non-pandas functions
Description
Closes #15482
Checklist
- [ ] I am familiar with the Contributing Guidelines.
- [ ] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
%%cudf.pandas.profile
def func1():
time.sleep(1)
def func2():
s = pd.Series([1, 2, 3])
s.max()
s.min()
func1()
func2()
Total time elapsed: 1.485 seconds
3 GPU function calls in 0.055 seconds
940 CPU function calls in 7.838 seconds
Stats
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ other │ 0 │ 0.000 │ 0.000 │ 940 │ 7.838 │ 0.008 │
│ Series │ 1 │ 0.028 │ 0.028 │ 0 │ 0.000 │ 0.000 │
│ Series.max │ 1 │ 0.015 │ 0.015 │ 0 │ 0.000 │ 0.000 │
│ Series.min │ 1 │ 0.012 │ 0.012 │ 0 │ 0.000 │ 0.000 │
└────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
Not all pandas operations ran on the GPU. The following functions required CPU fallback:
- other
To request GPU support for any of these functions, please file a Github issue here:
[https://github.com/rapidsai/cudf/issues/new/choose](https://github.com/rapidsai/cudf/issues/new?assignees=&labels=%3F+-+Needs+Triage%2C+feature+request&projects=&template=pandas_function_request.md&title=%5BFEA%5D).
Rich supports adding sections. If we're going to group all non-pandas calls into an "other" entry, can we add a line between that and all other rows so that there's a clear visual indication that "other" isn't some pandas function pd.other(...) but rather a separate grouping of all other functions? Also a name with spaces "All other functions" or similar would help clarify that too.
I have been using %%cudf.pandas.profile for a while now to create demos so wanted to weigh in. The 'other' section might be confusing to the user as to what it contains. When I look at current profiler results (with no 'other' option), it didn't took me a long time to realize that it is only capturing pandas functions. So it isn't a confusing UX IMO.
If you think its confusing, another way could be that we inform the user about the time discrepancy after the table with something like Note: Profiler will only list pandas function calls or something similar.
I agree, I think if we're going to include the "other" data we have to be careful what UX we present so that it alleviates rather than increases confusion.
Retargeting to 24.12. I need investigate how time functions called in the %%cudf.pandas.profile cell
This PR is currently blocked. I'm closing it for now until we discuss offline how to include only non cudf.pandas function calls that were actually called in the %%cudf.pandas.profile cell in the "other" section.