Judd
Judd
When loading a single-row matrix, `n_dims` mismatching occurs: https://github.com/foldl/chatllm.cpp/blob/247be09f8247c972d5f77d6263a2d173b8dfab8d/chat.cpp#L394C11-L398C12 Special treatment is needed for single-row matrices, or generally, tensors with the last `ne[]`-s are 1.
For those who want to have a test on DeepSeek-V2-Chat Light: [chatllm.cpp](https://github.com/foldl/chatllm.cpp) now supports it (with [conditions](https://github.com/foldl/chatllm.cpp/blob/master/docs/models.md#chatinstruct-models)). Comparing to @fairydreaming 's code, this one tries to follow the paper, but...
@fairydreaming I don't like to test perplexity. Instead, I compared each tensor of each layer against `modeling_deepseek.py`. Results show that differences are caused by rounding errors.
dis-alarmed.
FYI: Implementation in chatllm.cpp: https://github.com/foldl/chatllm.cpp/commit/887e6214cdf246249da9a68a896e6515ce875b00#diff-a68ca0be189c2d9485f5205180ba698c98de44d38cd81e8ee518cde0e9ae6d9c
[ChatLLM.cpp](https://github.com/foldl/chatllm.cpp) supports Phi-3.5 MoE model now. For developers: MoE Sparse MLP is ~the same as~ a little different from the one used in Mixtral.
You can try [chatllm.cpp](https://github.com/foldl/chatllm.cpp), which supports GLM-4.
My test with Vulkan and AMD iGPU (7840) on Windows works great.
Here is a function-calling example with Coder v2. https://github.com/foldl/chatllm.cpp/blob/master/docs/tool_calling.md
What is the decoded model?