Mukul Tripathi

Results 16 comments of Mukul Tripathi

If I have a Sapphire Rapids processor which is AMX enabled, how do i ensure that I have them enabled in llama.cpp? currently I am building it with ```bash cmake...

> Please DO NOT ADD --**cache_lens** If i do not specify cache_lens then i am restricted to 16k length. how do i specify 256k context length?

Here is the step by step tutorial to run it: https://www.youtube.com/watch?v=Xui3_bA26LE and here is the written guide: https://github.com/Teachings/AIServerSetup/blob/main/06-DeepSeek-R1-0528/01-DeepSeek-R1-0528-KTransformers-Setup-Guide.md Note: I have been unable to run it on .3.0 or .3.1...

Can you share CUDA version, nvcc and step by step on which commands you ran to build it? I can try to reproduce it and find a fix.

What command did you use for qwen3 to start the server?

> > Here is the step by step tutorial to run it: https://www.youtube.com/watch?v=Xui3_bA26LE and here is the written guide: https://github.com/Teachings/AIServerSetup/blob/main/06-DeepSeek-R1-0528/01-DeepSeek-R1-0528-KTransformers-Setup-Guide.md > > Note: I have been unable to run it...

I do not think this issue is resolved. I still have no way to set this on mobile. Would be nice to see settings on the phone. I cannot set...

This issue is resolved now. Run below command to update. git pull docker compose up --build ![Screenshot_20241112_143715_Samsung Internet](https://github.com/user-attachments/assets/dab2ba42-4a38-4d96-9314-aeca979dd2a6)

Specifically for the new r1-0528 (but results are similar for v3-0324): I have an amx supported pc, and I can confirm that performance for ktransformers is noticibly better than ik_llama...

@ubergarm and @ikawrakow Below is for Qwen3-235 billion parameter model. Tthank you for the pointers! For the qwen models, i added "-b 2048 -ub 2048" and that resulted in the...