Vincent Bosch
Vincent Bosch
> @vlbosch could you please try our latest nightly build version? I and some other noticed that the issue seems to be fixed and quite stable, at least on macOS,...
@danielhanchen I have an M3 Max Macbook Pro with 128GB Unified RAM. Happy to help you port this great project to Apple Silicon 😃 If it's any helpful, I can...
I am having the same issue on M3 Max with 128GB RAM and Mistral Large-2407 in 4 bit. The model can be loaded with llama.cpp and HF Transformers using all...
I am using asitop to watch the activity. Up until the tokens are generated 99/100% GPU is used with max Mhz. Right after the first token is streamed, the GPU-usage...
As per your suggestion in the other issue @awni I updated to (the latest) macOS Sequoia preview, but the issue persists. After a reboot and loading a large model like...
> Did you try setting the sysctl `sudo sysctl iogpu.disable_wired_collector=1`? That usually helps. Thanks! I can confirm this command works on macOS 15 DP 5, although it didn't work on...
The token generation speed is not different from macOS 14.6 after a fresh start. GPU Utilization shows 100% continuously throughout the generation. I am however under the impression that the...
@EricLBuehler Thanks for the quick reply! I can confirm that master builds correctly now. Maybe another issue, or I don't understand how ISQ works, but when trying to run a...
Did you guys manage to successfully reproduce EAGLE 2 with Mistral? If so, I am curious as to the changes/settings that yield the best results. I'd like to train EAGLE...
I would also like the option to add another local embeddings model, like for example BGE-M3. I tried adding it in the models-folder myself, but couldn't get it to work...