spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Ensure python/arrow tasks work with the GpuSemaphore stack reporting

Open abellina opened this issue 3 years ago • 0 comments

As @revans2 mentions here https://github.com/NVIDIA/spark-rapids/pull/6810#discussion_r996044048, the python worker interaction with the GpuSemaphore is a bit more complicated than 1 thread per task. I am filing this to investigate this edge case. Ideally if we have a job that is using this python interface we should be able to dump the java stack traces where we last saw these threads.

Note there is a PythonWorkerSemaphore and it would be great to understand if there are ways to reuse the GpuSemaphore for some of this in the future. I think (though I am not sure) that if all threads are using one semaphore it would simplify things, though I bet the code isn't setup for it now (e.g. as the PythonWorkerSemaphore says, the main semaphore is initializing the GPU, but the python semaphore does not want to do that, because the python process will initialize access).

@firestarman fyi.

abellina avatar Oct 15 '22 22:10 abellina