rr icon indicating copy to clipboard operation
rr copied to clipboard

Support using GPU via CUDA

Open csullivan opened this issue 6 years ago • 12 comments

I am using standard gdb with code that has CUDA library calls. I don't need to debug the CUDA specific code, but would like the functionality of rr for the host code. Is there a way to achieve this?

Currently when using rr record on a binary that calls cuInit() I receive a CUDA_ERROR_NO_DEVICE assertion.

csullivan avatar Apr 02 '18 06:04 csullivan

rr blocks use of GPU drivers because it doesn't understand their communication between userspace and kernel space/hardware. That's probably what's biting you here.

If there's a way to force-enable a CUDA backend that runs entirely on the host CPU that would solve this problem. Otherwise it probably can't be fixed easily.

It might be possible to teach rr to support specific families of GPU drivers, but this would be a difficult (and interesting!) project ... especially for closed-source drivers.

rocallahan avatar Apr 02 '18 06:04 rocallahan

I had a look in the past to properly support the CUDA driver, but didn't get very far, IIRC there was memory being touched that wasn't referred/pointed to by any of the args to the ioctl's. Repo at https://github.com/maleadt/rr/commits/cuda

maleadt avatar Nov 17 '18 08:11 maleadt

Mmm thanks!

rocallahan avatar Nov 17 '18 20:11 rocallahan

I noticed that nvprof supports kernel replay to collect different metrics in several passes for profiling. Are there any documents / implementation details to help?

zingdle avatar Nov 20 '19 05:11 zingdle

You mean Nvidia docs or rr docs? For rr you have all the code and the highlevel overview here: https://arxiv.org/abs/1705.05937

rocallahan avatar Nov 20 '19 12:11 rocallahan

I mean NVIDIA docs. Because NVIDIA seems to be able to replay kernels in their own software nvprof, I'm wondering if there's any docs to shed some light to this issue.

zingdle avatar Nov 20 '19 12:11 zingdle

Yeah I run into this issue as well. RR really cannot touch the GPU global memory. Hope this issue can be solved by later guys.

yuxineverforever avatar Aug 23 '20 04:08 yuxineverforever

Issue #2507 is related --- supporting GPUs via GL.

rocallahan avatar Dec 22 '21 11:12 rocallahan

@Keno do you happen to know how Nvidia GPU virtualization works in Linux guest VMs? Does it use the same undocumented driver interface as on bare metal?

rocallahan avatar Dec 22 '21 11:12 rocallahan

@maleadt is the expert. He'll be able to say.

Keno avatar Dec 22 '21 11:12 Keno

Sorry, I don't have experience with NVIDIA's (non-free) Virtual GPU software... The manual does talk about a separate vGPU driver though, but I couldn't inspect it as you need licenses to even download the software.

maleadt avatar Dec 22 '21 13:12 maleadt