feat: deepseek_v1 gqa and correct normalization mode
For deepssek_v1 norm_topk_prob should be false.
https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/config.json
Hi @akhoroshev thanks for your contribution, we'll take a look on it.
Hi @akhoroshev we've fixed it internally about 1 month ago. Due to 0.17 release ,we suspended the weekly update for a while. Since the 0.17 already released, we'll resume the weekly update soon.
For the gqa part change, the deepseek doesn't use this attention AFAIK. Do you have any specific reason for that?
Do you have any specific reason for that?
I have internal model based on deepseek_v1 with gqa
Do you have any specific reason for that?
I have internal model based on deepseek_v1 with gqa
ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.
ok. I‘m afraind that we can't merge this PR since it's only works for private model at this moment.
I also found an open model that uses gqa deepseek v1
https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct
@akhoroshev Hi, we plan to deprecate DS V1/V2 support, with only keeping the V3/R1 model support. So we may not accept this MR for now.
Thanks June