softmax-splatting icon indicating copy to clipboard operation
softmax-splatting copied to clipboard

Multi-GPU illegal memory access in softmax-splatting – any recommended workaround?

Open kmp1001 opened this issue 5 months ago • 0 comments

Description

I’m integrating the softmax-splatting module code into my own training pipeline. On a single GPU everything works flawlessly, but as soon as I switch to multi-GPU training (e.g. via nn.DataParallel), I immediately hit a CUDA illegal memory access error deep inside the softmax-splatting call.

I haven’t modified the softmax-splatting implementation itself—aside from adapting imports—so I’m wondering:

  1. Have you ever observed this issue when training on multiple GPUs?
  2. Do you use any special cupy flags or memory-allocation strategies (e.g. stream settings, device context) to make softmax-splatting safe under DataParallel or DDP?
  3. Do you know of any workarounds (for example, batching differently, pinning certain buffers to CPU, or an alternate splatting implementation) that would avoid the illegal memory access?

Below is a minimal outline of my setup and the error:


Environment

  • OS: Ubuntu 20.04
  • Python: 3.10.8
  • PyTorch: 2.1.1+cu121
  • CUDA: 12.1
  • cupy: cupy-cuda12x 11.5.0
  • GPUs: 2 × NVIDIA A100 40 GB

Error snippet

    temp = warp(x_start[idx].unsqueeze(0),
                flow0[idx].unsqueeze(0) * t_new[idx])[0]
File ".../new_interface.py", line 255, in warp_b_s
    image1 = image1.to(device)
RuntimeError: CUDA error: an illegal memory access was encountered

Thank you for any guidance! — A user of your softmax-splatting module

kmp1001 avatar May 12 '25 08:05 kmp1001