carla icon indicating copy to clipboard operation
carla copied to clipboard

NuRec Docker container fails with "no kernel image is available for execution on the device" on Tesla T4 GPU

Open ghazalmatt3r opened this issue 7 months ago • 0 comments

Environment Details

  • OS: Ubuntu 22.04 LTS (x86_64)
  • GPU: NVIDIA Tesla T4 (Compute Capability 7.5)
  • NVIDIA Driver: 570.172.08
  • CUDA Version: 12.8 (host system)
  • Docker Image: carlasimulator/nvidia-nurec-grpc:0.2.0
  • CARLA Version: 0.9.16

Problem Description

The NuRec Docker container starts successfully and finds the scene, but fails when trying to create the neural reconstruction backend with the error:

CUDA error: no kernel image is available for execution on the device

Steps to Reproduce

  1. Install CUDA 12.8 and NVIDIA driver 570
  2. Run the NuRec example script:
python example_nurec_replay_save_images.py --usdz-filename PhysicalAI-Autonomous-Vehicles-NuRec/sample_set/25.07_release/026d6a39-bd8f-4175-bc61-fe50ed0403a3/026d6a39-bd8f-4175-bc61-fe50ed0403a3.usdz --move-spectator --saveimages

Expected Behavior

The script should successfully create the neural reconstruction backend and process the scene.

Actual Behavior

The script fails with CUDA kernel compatibility error during backend creation.

Additional Information

  • The container starts successfully and reports the correct compute capability (7.5)
  • The error occurs specifically during the "Creating new backend" phase
  • Both host system and container have CUDA 12.8
  • The issue appears to be that the neural reconstruction models were compiled for different GPU architectures than Tesla T4

Logs

python example_nurec_replay_save_images.py --usdz-filename PhysicalAI-Autonomous-Vehicles-NuRec/sample_set/25.07_release/026d6a39-bd8f-4175-bc61-fe50ed0403a3/026d6a39-bd8f-4175-bc61-fe50ed0403a3.usdz --move-spectator --saveimages
2025-09-18 20:54:57 INFO     CUDA_VISIBLE_DEVICES not set, defaulting to GPU 0
2025-09-18 20:54:57 INFO     Starting container NuRec_clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_run_6289390c on localhost:46435
2025-09-18 20:54:57 INFO     Waiting for server to start and scene to load...
2025-09-18 20:55:22 ERROR    Critical error detected: [2025-09-18 20:55:22,112][nre.grpc.serve][ERROR] Failed to create backend for clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3: CUDA error: no kernel image is available for execution on the device
2025-09-18 20:55:22 ERROR    Timeout waiting for server to be ready
Traceback (most recent call last):
  File "/opt/carla_0.9.16/PythonAPI/examples/nvidia/nurec/example_nurec_replay_save_images.py", line 333, in <module>
    main()
  File "/opt/carla_0.9.16/PythonAPI/examples/nvidia/nurec/example_nurec_replay_save_images.py", line 286, in main
    with NurecScenario(
  File "/opt/carla_0.9.16/PythonAPI/examples/nvidia/nurec/nurec_integration.py", line 695, in __enter__
    super().__enter__()
  File "/opt/carla_0.9.16/PythonAPI/examples/nvidia/nurec/nurec_render_service.py", line 516, in __enter__
    self.start()
  File "/opt/carla_0.9.16/PythonAPI/examples/nvidia/nurec/nurec_render_service.py", line 500, in start
    raise RuntimeError("Server failed to start properly")
RuntimeError: Server failed to start properly

Workaround Attempts

  • Tried various CUDA environment variables (CUDA_VISIBLE_DEVICES=0, CUDA_LAUNCH_BLOCKING=1, TORCH_USE_CUDA_DSA=1, CUDA_CACHE_DISABLE=1)
  • Tried running without flags
  • Verified compute capability compatibility (7.5)
  • Confirmed CUDA version compatibility (12.8)

ghazalmatt3r avatar Sep 18 '25 21:09 ghazalmatt3r