nannaer

Results 8 issues of nannaer

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...

As shown in the figure, after assigning different num_tokens parameters to each RANK and running test_low_latency.py, the process gets stuck. What methods can be used to profile the impact of...

I'd like to know where **_synchronization across all ranks_** is required for both dispatch and combine operations, using the following code that calls low_latency_dispatch and low_latency_combine as an example. Specifically:...

When deploying PD-separated Deepseek v3 and running multi-machine Decode with two machines, it works normally. However, when using four machines, the following errors occur. I hope to get your help....

when use flash-attn-3 3.0.0b1 as attention kernel in serving engine to serve Qwen3-235`B-A22B, we encounter illegal memory access as follows (ModelRunner pid=871993, ip=10.102.207.116) CUDA error (/workspace/flash-attention/hopper/flash_fwd_launch_template.h:198): an illegal memory access...

Thank you very much for your contributions to the RTP-LLM inference engine! I have a question about the load balancing strategy in the technical report. In the DeepEP framework, Dispatch...

When running the two-machine test_low_latency.py (EP16), there is a significant difference in the test results of dispatch and combine on the two machines. My version number is 9fe9021, and I...

Thank you very much for your contributions to the DeepEP! I have a question about the latency of batch per rank unbalance scenario. The above figure is my theoretical estimation...