remaper
remaper
follow form https://zhuanlan.zhihu.com/p/714761319 > @zhyncs > > 非常有意思的文章,我们大概在一个月前,在 SGLang 中实现了你文中提到的 A_CC_ME 版本 [如何看待 DeepSeek 发布的 MoE 大模型 DeepSeek-V2?](https://www.zhihu.com/question/655172528/answer/3584052758) > > 以及做了 MQA 的优化 [Optimize MLA/GQA/MQA Triton decoding by ispobock ·...
@zhyncs why? have you solved the problem?
@zhyncs It seems that when you calculate attention, you only use DP parallelism and not TP parallelism? But I saw in the original paper that TP parallelism was used when...
@CHesketh76 这个问题你们解决了吗?
> I had another question regarding DP attention. The [sglang blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models) mentions that DP attention is effective because of the MLA has only 1 KV head, which causes unnecessary duplication...
chunked_prefill_enable = False INFO 09-01 12:46:11 async_llm_engine.py:268] 7cbe74f5c90c4a95954ae8b87d36a3c6 finished E2E: 0.29664182662963867, TTFT: 0.29621362686157227, TBT: 0.00042819976806640625, TIQ: 0.001392364501953125 INFO 09-01 12:46:15 async_llm_engine.py:268] 9bbc02b5dc904963a915612fc8951d0a finished E2E: 0.29630255699157715, TTFT: 0.2959132194519043, TBT: 0.00038933753967285156, TIQ:...