wingenlit

Results 5 comments of wingenlit

The search is `due to technical issues unavailable` after their boss went to an important meeting in Beijing. They said clearly.

UPDATE Llama.cpp had added support on mistral-nemo at version [`b3436`](https://github.com/ggerganov/llama.cpp/releases/tag/b3436) onwards. Therefore, llamafile will be updated soon. For information only, as a result some earlier gguf checkpoints using fork version...

sorry about just closing the issue without the inside knowledge. will wait for the problem being resolved.

UPDATE: recent testing update here. `llamafile-0.8.13` works with mistral-nemo now; great! unfortunately, it is distinctively slower than llama-cpp (my version is `b3949`). What am I missing here? LLAMAFILE (compile flagged,...

It is actually possible to calculate differences for each MoE experts first, ship the diff file into VRAM, and dynamically craft a base expert to target MoE experts in parallel...