TensorRT-LLM feat: deepseek_v1 gqa and correct normalization mode

For deepssek_v1 norm_topk_prob should be false.

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

Jan 23 '25 12:01 akhoroshev

Hi @akhoroshev thanks for your contribution, we'll take a look on it.

Jan 25 '25 14:01 nv-guomingz

Hi @akhoroshev we've fixed it internally about 1 month ago. Due to 0.17 release ,we suspended the weekly update for a while. Since the 0.17 already released, we'll resume the weekly update soon.

For the gqa part change, the deepseek doesn't use this attention AFAIK. Do you have any specific reason for that?

Feb 05 '25 05:02 nv-guomingz

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

Feb 05 '25 06:02 akhoroshev

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

Feb 06 '25 00:02 nv-guomingz

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

I also found an open model that uses gqa deepseek v1

https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct

Feb 06 '25 15:02 akhoroshev

@akhoroshev Hi, we plan to deprecate DS V1/V2 support, with only keeping the V3/R1 model support. So we may not accept this MR for now.

Thanks June

Mar 24 '25 05:03 juney-nvidia