Jim Lai
Jim Lai
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction View https://github.com/hiyouga/LLaMA-Factory README.md Look up Hardware requirements, checking between 7B and 13B columns ###...
### Environment Self-Hosted (Bare Metal) ### System Windows 11 ### Version 1.11.4 ### Desktop Information Python 3.11.5 Current stable node.js Occurs with every GGUF model I try, even with bare...
# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...
### Reminder - [X] I have read the README and searched the existing issues. ### System Info N/A ### Reproduction N/A ### Expected behavior Consideration of Adam-mini optimizer, which claims...
Resulting model weights and SLERP merge formula here: https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B An exl2 quant of the above works, but where did the extra 1B parameters come from?
I ask this because the llama.cpp project has a server which can accept multiple control vectors when provided as GGUF.
I'm having trouble loading Gemma2 2B It, which has bf16 weights, rather than fp16. Is this something easily fixed? Using numpy 1.26.4 and torch 2.2.2+cu121. Loading checkpoint shards: 100%|█████████████████| 2/2...
Using Python 3.11, pytorch 2.5.1, cuda 12.4 here. I was able to install a triton wheel of this windows port to ensure compatibility. https://github.com/woct0rdho/triton-windows I also locally built a complete...