fairydreaming
fairydreaming
> I haven't tested as well, but it seems good so feel free to merge @ggerganov I noticed that Snowflake changed the Arctic model 2 weeks ago. The commit says:...
I'm working on it right now: https://youtu.be/1AG-GUtDvaw The code needs some cleanup, so it's not published yet.
> @fairydreaming Oh wow how awesome!! How does the ppl look? @SinanAkkoyun At this moment it's somewhat high (Q8_0): ``` perplexity: tokenizing the input .. perplexity: tokenization took 1107.87 ms...
You can try my branch if you want: https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2 The model works but there are several issues: - The implementation is suboptimal, since it permutes K and Q tensors during...
> How many are the parameters? I don't think we have a better solution than adding them to the GGUF header @ggerganov here they are: ``` // TODO maybe move...
> > I see some differences in YaRN implementation between DeepSeek-V2 and llama.cpp (calculation of mscale). Is there any YaRN expert on board? > > There is this PR from...
> Would love to see support for the smaller [MoE models](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat). They seem to be good and only use 2.5b active parameters for token generation. @CyberTimon I added support for...
> Hm, that's strange - what's the point of multiplying by `1.0`. Not sure if we should modify our implementation - probably we just need to disable YARN for DS2...
> Is the main branch code now able to support DeepseekV2 inference? No, not yet
> For those who want to have a test on DeepSeek-V2-Chat Light: [chatllm.cpp](https://github.com/foldl/chatllm.cpp) now supports it (with [conditions](https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md#chatinstruct-models)). > > Comparing to @fairydreaming 's code, this one tries to follow...