gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

Infiniband support

Open TheQuantumFractal opened this issue 1 year ago • 4 comments

Description

I'm looking to do RDMA within gVisor containers and was curious if you support Infiniband or if this would be on the roadmap? Thanks!

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

TheQuantumFractal avatar Sep 13 '24 15:09 TheQuantumFractal

There's no specific support for Infiniband. Can you help me understand what support would be needed? gVisor containers typically communicate through a virtual device (often veth). On a machine with an Infiniband NIC, packets would switch from veth to NIC without issue as far as I understand.

I don't know much about RDMA, but there's no special support for it in gVisor. I'm not sure whether it's needed, or whether having the underlying host support it is enough.

kevinGC avatar Sep 13 '24 18:09 kevinGC

Hi @kevinGC, we think it would involve supporting the Infiniband verbs in libibverbs, which are operations that let you send and receieve data while bypassing the kernel networking stack.

There is a device called /dev/infiniband/uverbs0 but none of us are familiar with the internals yet unfortunately.

We've seen FreeFlow (https://github.com/Microsoft/Freeflow) from Microsoft and would be looking for something similar to maximize throughput.

ekzhang avatar Sep 25 '24 21:09 ekzhang

Having looked (maybe too) quickly at verbs, it should be possible to support if my understanding is correct. Thoughts:

  • Infiniband verbs are probably a bunch of ioctls for their special character device. We can support this: we'd make our own virtual per-container/pod /dev/infinibad/uverbs0 that understands and safety-checks ioctls. We'd also have syscall filters specific to Infiniband (e.g. GPUs).
  • Based on my super quick look at your links, I think libibverbs works by mapping in some shared memory for notification queues and packet data. This reminds me of XDP support, and so I think should work as well. We would need a link endpoint that speaks Infiniband verbs.

While the path to implementation seems reasonably clear, this is a significant chunk of work. The implementer would need to understand Infiniband verbs. I think we'd accept a PR for it, but for now it's not on the roadmap.

kevinGC avatar Sep 25 '24 23:09 kevinGC

Sounds good, thank you for sharing your thoughts on the tractability of this!

ekzhang avatar Sep 26 '24 00:09 ekzhang