wingenlit comments

Results 5 comments of


                                            wingenlit

[BUG] (Due to technical issues, the search service is temporarily unavailable.)

The search is `due to technical issues unavailable` after their boss went to an important meeting in Beijing. They said clearly.

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model

UPDATE Llama.cpp had added support on mistral-nemo at version [`b3436`](https://github.com/ggerganov/llama.cpp/releases/tag/b3436) onwards. Therefore, llamafile will be updated soon. For information only, as a result some earlier gguf checkpoints using fork version...

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model

sorry about just closing the issue without the inside knowledge. will wait for the problem being resolved.

Bug: unknown pre-tokenizer type: ''mistral-bpe" when running the new Mistral-Nemo model

UPDATE: recent testing update here. `llamafile-0.8.13` works with mistral-nemo now; great! unfortunately, it is distinctively slower than llama-cpp (my version is `b3949`). What am I missing here? LLAMAFILE (compile flagged,...

Feature Request: MoE only load activated expert(s) to GPU while rest non-used experts are not loaded (to CPU/GPU) for DeekSeek-R1 Inference on consumer GPU

It is actually possible to calculate differences for each MoE experts first, ship the diff file into VRAM, and dynamically craft a base expert to target MoE experts in parallel...