IsaacLab [Proposal] Error response from daemon: unknown or invalid runtime name: nvidia

Proposal:
Update Docker configuration to use modern GPU resource allocation methods instead of deprecated runtime: nvidia specification in CloudXR-related compose files.

Motivation:
The current configuration leads to "unknown or invalid runtime name: nvidia" errors when starting containers, despite having proper NVIDIA toolchain installed. This occurs because:

Modern Docker versions (19.03+) prefer deploy.resources over explicit runtime specification
Mixed configuration approaches cause conflicts in GPU resource allocation
The error interrupts CloudXR runtime initialization, blocking development workflows

Core use cases affected:

Local development/testing of CloudXR applications
Isaac Sim integration with CloudXR services
Multi-container GPU resource sharing scenarios

Related to frustration when:
"I'm unable to launch the container environment even after properly configuring NVIDIA Container Toolkit and verifying GPU accessibility through standard Docker commands."

Alternatives Considered:

Maintaining legacy runtime: nvidia specification
- Would require users to modify system-wide Docker configurations
- Creates compatibility issues across different Docker versions
Using --gpus CLI flag instead of compose file configuration
- Not scalable for multi-service environments
- Harder to maintain in version-controlled configurations

Additional Context:
Current error output:

Error response from daemon: unknown or invalid runtime name: nvidia

Successful GPU verification test:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

(Works correctly, proving base NVIDIA configuration is valid)

Checklist:

[ ] I have checked that there is no similar issue in the repo

Acceptance Criteria:

[ ] Updated compose files work without runtime specification errors
[ ] Both CloudXR runtime and Isaac Lab containers can access GPU resources
[ ] Configuration remains compatible with Docker 19.03+ and NVIDIA Container Toolkit 1.0+
[ ] Documentation reflects any required changes in setup procedures

System Information:

Docker Version: 28.0.4
NVIDIA Driver Version: 570.124.06
OS Version: Ubuntu 22.04.5 LTS

Apr 28 '25 01:04 DigitLion

Thank you for posting this proposal. The team will review it.

Apr 28 '25 20:04 RandomOakForest

Hi @DigitLion , thanks for the proposal. It makes sense to use --gpus over runtime:nvidia but the key is to make sure CloudXR Runtime and Isaac Lab share use of the same GPUs.

If you are using the docker python scripts with docker compose, you can edit docker/docker-compose.cloudxr-runtime.patch.yaml file to remove runtime: nvidia and add the following to both cloudxr-runtime and isaac-lab-base:

...
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
...

If you use device_ids as mentioned in https://docs.docker.com/compose/how-tos/gpu-support/, please make sure you pass the same IDs for both containers.

I will send out a PR soon about the change needed.

Apr 30 '25 20:04 yanziz-nvidia