TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: deepseek_v1 gqa and correct normalization mode

Open akhoroshev opened this issue 11 months ago • 6 comments

For deepssek_v1 norm_topk_prob should be false.

https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json

akhoroshev avatar Jan 23 '25 12:01 akhoroshev

Hi @akhoroshev thanks for your contribution, we'll take a look on it.

nv-guomingz avatar Jan 25 '25 14:01 nv-guomingz

Hi @akhoroshev we've fixed it internally about 1 month ago. Due to 0.17 release ,we suspended the weekly update for a while. Since the 0.17 already released, we'll resume the weekly update soon.

For the gqa part change, the deepseek doesn't use this attention AFAIK. Do you have any specific reason for that?

nv-guomingz avatar Feb 05 '25 05:02 nv-guomingz

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

akhoroshev avatar Feb 05 '25 06:02 akhoroshev

Do you have any specific reason for that?

I have internal model based on deepseek_v1 with gqa

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

nv-guomingz avatar Feb 06 '25 00:02 nv-guomingz

ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.

I also found an open model that uses gqa deepseek v1

https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct

akhoroshev avatar Feb 06 '25 15:02 akhoroshev

@akhoroshev Hi, we plan to deprecate DS V1/V2 support, with only keeping the V3/R1 model support. So we may not accept this MR for now.

Thanks June

juney-nvidia avatar Mar 24 '25 05:03 juney-nvidia