DeepEP icon indicating copy to clipboard operation
DeepEP copied to clipboard

When deploying with multiple machines, the following error was encountered

Open nannaer opened this issue 7 months ago • 1 comments

When deploying PD-separated Deepseek v3 and running multi-machine Decode with two machines, it works normally. However, when using four machines, the following errors occur. I hope to get your help. Figure 1 shows the error on the master machine, and Figure 2 shows the error on the sub machine. I'm not sure whether this error is related to DeepEP or not. Can anyone provide some help?

Image

Image

nannaer avatar Jun 04 '25 11:06 nannaer

It seems that your program encountered an issue while establishing TCP socket connections. Please check your TCP network environment.

sphish avatar Jun 05 '25 02:06 sphish