cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Provide more information on reasons for CPU fallback in the cudf.pandas profiler

Open vyasr opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. Currently the cudf.pandas profiler can be used to determine what parts of a workflow are run on the GPU (with cudf) and which parts fall back to CPU (pandas). The original idea was that users would identify these edge cases and support feature requests if there was missing functionality. This is great for us and for long term improvements, but it does not help users in the short term. Given our generally high coverage of the pandas API (especially frequently-used APIs), the bulk of the time when there is CPU fallback it will not be because an API is completely unsupported, but rather because some parameter is unsupported or the data cannot be processed (e.g. unsupported dtypes). In at least some of these cases, the user may be able to modify their code immediately to get things working. Without more information on the reasons for failure, though, the user would have to spelunk into the codebase to figure out what changes they might be able to make.

Describe the solution you'd like In addition to reporting that fallback occurs, the profiler should also report why fallback occurred. The easiest way to accomplish this would be to capture the exception raised during the attempted cudf call and include that in the profiler output.

Describe alternatives you've considered None.

Additional context Since some exceptions may be very verbose, we will need to do some work to consider how best to display the output without making the profilers unreadable.

vyasr avatar Oct 10 '24 16:10 vyasr