Mooncake
Mooncake copied to clipboard
[Bug]: RDMA Device Misidentification in Container Environment
Bug Report
Environment:
- Hardware: H20 machine with 4 physical RDMA NICs
- Container setup: 2 GPUs requesting 2 virtual RDMA devices
Issue: RDMA device discovery incorrectly identifies devices in the container environment.
Reproduction:
- Deploy container with 2 GPUs on H20 machine (4 physical RDMA NICs available)
- Request 2 virtual RDMA devices for the container
- Observe incorrect device identification and GID index lookup
Expected: Virtual RDMA devices should be correctly mapped and identified Actual: Device discovery fails to properly recognize the virtual RDMA devices
Before submitting...
- [ ] Ensure you searched for relevant issues and read the [documentation]
@stmatengss I will fix this bug because I have the environment
fixed https://github.com/kvcache-ai/Mooncake/pull/1077