tengdecheng

Results 2 comments of tengdecheng

Nice work ! I reproduce on H20x4nodes (8GPUs per node) with 2 prefill nodes and 2 decode nodes, I got the error bellow: ``` During handling of the above exception,...

> Looks like `DeepEP error: timeout`. Could you please check all nodes' logs to see whether there are other errors before this? Often it is caused by e.g. one node...