PaddleCloud icon indicating copy to clipboard operation
PaddleCloud copied to clipboard

Paddle Cloud need to use host network of kubernetes.

Open drinktee opened this issue 7 years ago • 5 comments

Now paddle run in pods that usually on an overlay network. Overlay may cause some loss of performance of network. Most of the time in deep learning we use a high performance network like RDMA and host network may make sense.

drinktee avatar Oct 30 '17 02:10 drinktee

Supporting for the high-performance network is indeed what we needed, but using host network will break the design of Kubernetes.

Kubernetes design the network to be "flat", means containers should treat each other and hosts equally: https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model

For RDMA, or for "low-latency" cases, we need to implement this design using hardware accelerator or other technics to achieve this.

typhoonzero avatar Oct 30 '17 03:10 typhoonzero

We can use a switch to turn on/off host network. For achieving this there shoud be a component which can figure out available host ports. And this component stand alone without paddle cloud.

drinktee avatar Oct 30 '17 03:10 drinktee

@drinktee Yep. Still, I think this solution should be a "workaround", it's not a best technical choice.

typhoonzero avatar Oct 31 '17 02:10 typhoonzero

Yes, it's a solution for some special scene.

Yep. Still, I think this solution should be a "workaround", it's not a best technical choice.

drinktee avatar Oct 31 '17 02:10 drinktee

We used to submit issue for this, but Kubernetes will not handle host port auto-allocation, so we need to manage the port ourselves. I don't think it's a work around solution, and we need to make it decent.

tizhou86 avatar Nov 02 '17 08:11 tizhou86