PaddleCloud
PaddleCloud copied to clipboard
Paddle Cloud need to use host network of kubernetes.
Now paddle run in pods that usually on an overlay network. Overlay may cause some loss of performance of network. Most of the time in deep learning we use a high performance network like RDMA and host network may make sense.
Supporting for the high-performance network is indeed what we needed, but using host network will break the design of Kubernetes.
Kubernetes design the network to be "flat", means containers should treat each other and hosts equally: https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model
For RDMA, or for "low-latency" cases, we need to implement this design using hardware accelerator or other technics to achieve this.
We can use a switch to turn on/off host network. For achieving this there shoud be a component which can figure out available host ports. And this component stand alone without paddle cloud.
@drinktee Yep. Still, I think this solution should be a "workaround", it's not a best technical choice.
Yes, it's a solution for some special scene.
Yep. Still, I think this solution should be a "workaround", it's not a best technical choice.
We used to submit issue for this, but Kubernetes will not handle host port auto-allocation, so we need to manage the port ourselves. I don't think it's a work around solution, and we need to make it decent.