softmax-splatting
softmax-splatting copied to clipboard
Multi-GPU illegal memory access in softmax-splatting – any recommended workaround?
Description
I’m integrating the softmax-splatting module code into my own training pipeline. On a single GPU everything works flawlessly, but as soon as I switch to multi-GPU training (e.g. via nn.DataParallel), I immediately hit a CUDA illegal memory access error deep inside the softmax-splatting call.
I haven’t modified the softmax-splatting implementation itself—aside from adapting imports—so I’m wondering:
- Have you ever observed this issue when training on multiple GPUs?
- Do you use any special
cupyflags or memory-allocation strategies (e.g. stream settings, device context) to make softmax-splatting safe under DataParallel or DDP? - Do you know of any workarounds (for example, batching differently, pinning certain buffers to CPU, or an alternate splatting implementation) that would avoid the illegal memory access?
Below is a minimal outline of my setup and the error:
Environment
- OS: Ubuntu 20.04
- Python: 3.10.8
- PyTorch: 2.1.1+cu121
- CUDA: 12.1
- cupy: cupy-cuda12x 11.5.0
- GPUs: 2 × NVIDIA A100 40 GB
Error snippet
temp = warp(x_start[idx].unsqueeze(0),
flow0[idx].unsqueeze(0) * t_new[idx])[0]
File ".../new_interface.py", line 255, in warp_b_s
image1 = image1.to(device)
RuntimeError: CUDA error: an illegal memory access was encountered
Thank you for any guidance! — A user of your softmax-splatting module