cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Make the cudf.pandas profiler show time taken by other, non-pandas functions

Open Matt711 opened this issue 1 year ago • 4 comments

Description

Closes #15482

Checklist

  • [ ] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

Matt711 avatar Aug 09 '24 19:08 Matt711

%%cudf.pandas.profile

def func1():
    time.sleep(1)

def func2():
    s = pd.Series([1, 2, 3])
    s.max()
    s.min()

func1()
func2()
Total time elapsed: 1.485 seconds                            
                                 3 GPU function calls in 0.055 seconds                          
                                940 CPU function calls in 7.838 seconds                         
                                                                                                
                                                 Stats                                          
                                                                                                
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function   ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ other      │ 0          │ 0.000       │ 0.000       │ 940        │ 7.838       │ 0.008       │
│ Series     │ 1          │ 0.028       │ 0.028       │ 0          │ 0.000       │ 0.000       │
│ Series.max │ 1          │ 0.015       │ 0.015       │ 0          │ 0.000       │ 0.000       │
│ Series.min │ 1          │ 0.012       │ 0.012       │ 0          │ 0.000       │ 0.000       │
└────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
Not all pandas operations ran on the GPU. The following functions required CPU fallback:

- other

To request GPU support for any of these functions, please file a Github issue here: 
[https://github.com/rapidsai/cudf/issues/new/choose](https://github.com/rapidsai/cudf/issues/new?assignees=&labels=%3F+-+Needs+Triage%2C+feature+request&projects=&template=pandas_function_request.md&title=%5BFEA%5D).

Matt711 avatar Aug 09 '24 19:08 Matt711

Rich supports adding sections. If we're going to group all non-pandas calls into an "other" entry, can we add a line between that and all other rows so that there's a clear visual indication that "other" isn't some pandas function pd.other(...) but rather a separate grouping of all other functions? Also a name with spaces "All other functions" or similar would help clarify that too.

vyasr avatar Aug 16 '24 17:08 vyasr

I have been using %%cudf.pandas.profile for a while now to create demos so wanted to weigh in. The 'other' section might be confusing to the user as to what it contains. When I look at current profiler results (with no 'other' option), it didn't took me a long time to realize that it is only capturing pandas functions. So it isn't a confusing UX IMO.

If you think its confusing, another way could be that we inform the user about the time discrepancy after the table with something like Note: Profiler will only list pandas function calls or something similar.

singhmanas1 avatar Aug 16 '24 18:08 singhmanas1

I agree, I think if we're going to include the "other" data we have to be careful what UX we present so that it alleviates rather than increases confusion.

vyasr avatar Aug 16 '24 18:08 vyasr

Retargeting to 24.12. I need investigate how time functions called in the %%cudf.pandas.profile cell

Matt711 avatar Sep 24 '24 03:09 Matt711

This PR is currently blocked. I'm closing it for now until we discuss offline how to include only non cudf.pandas function calls that were actually called in the %%cudf.pandas.profile cell in the "other" section.

Matt711 avatar Oct 07 '24 18:10 Matt711