Li Hui
Li Hui
I have the same problem, when I open flashinfer MLA.
> [@pseudotensor](https://github.com/pseudotensor) [@Hugh-yw](https://github.com/Hugh-yw) [@lambert0312](https://github.com/lambert0312) The issue of bad output should be fixed by [#3785](https://github.com/sgl-project/sglang/pull/3785), please stay tuned! Great work @Fridge003
I see `compatible with radix cache and chunked prefill`. How is it going? Long context scenarios require this feature. @zhyncs
The overlap scheduler with DP attention can not be used on A800 * 4., because always OOM.
[DeepSeek MTP spec decode #12755](https://github.com/vllm-project/vllm/pull/12755) is Implement DeepSeek MTP: https://github.com/vllm-project/vllm/issues/12181 to support DeepSeek MTP layers for next n prediction.
This is https://github.com/CentML's implementation of DeepSeek MTP modules that enable speculative decoding for DeepSeek-R1. https://github.com/vllm-project/vllm/pull/12915
> Thank you ! I am working on ROCM (MI210) platform. Will update soon. Can you verify the A800 environment? @yiakwy-xpu-ml-framework-team
> > > Thank you ! I am working on ROCM (MI210) platform. Will update soon. > > > > > > Can you verify the A800 environment? @yiakwy-xpu-ml-framework-team >...
@yiakwy-xpu-ml-framework-team I rebuilt the kernel using the new code, and the following error occurred when starting: ``` [2025-02-19 02:32:32 TP29] Scheduler hit an exception: Traceback (most recent call last): File...