Yizhi Wang

Results 10 comments of Yizhi Wang

I think you should check AI-generated issues yourself before submitting to avoid basic mistakes.

> https://github.com/deepseek-ai/DeepEP/tree/try_fix_roce_mqp @sphish May I ask if this change will be incorporated into the main branch?

> Nice work! But actually I don't have some guideline to tune this. And we are working on a TMA version instead of any kind of LD/ST copies. We will...

I am also experiencing the same hang issue. In my case, it occurs with a setup of 4 machines (H20*8), where 2 machines function normally. The program is also stuck...

@shifangx I have also tested on 4*4 GB200, and it runs successfully. Here's my tests on 4*8 H20. When set do_check= False or skip round_scale==True, the test can run successfully....

> what about `8*4 GB200` @shifangx Unfortunately, I only have a 4x4 GB200 , so I can't test the larger configuration

@shifangx Thanks! The assert didn't print out correctly after it was triggered, which made the issue quite confusing. What is the theoretical lower bound for precision when using ue8m0 for...

another error ``` [config] num_tokens=4096, hidden=7168, num_topk_groups=4, num_topk=8 [layout] Kernel performance: 0.045 ms [testing] Running with BF16, without top-k (async=False, previous=False) ... passed [testing] Running with BF16, with top-k (async=False,...

some other errors. (WIth my print modify)