Cgrandjean
Cgrandjean
I'm Not sure what s the problem there. What you can try is to use directly llama.cpp to create the gguf, they have a convert_to_gguf.py file there you can use...
yeah seems those issues are all related. I can confirm got not only the issue i described in #1877, but also this one with 16 bit adapter merging. I could...
got exact same issue with unsloth/Devstral-Small-2507-bnb-4bit after i merged my adapter
Thanks for the Answer, Before I tried with 14B and got same problem as with 32 B. To be honest i would expect to be able to fine tune a...
Hey , Thanks for coming back to the issue. I investigated a bit the matter , its not a VRAM OOMing but a RAM one . i could overcome the...
Hello , I'm still blocked on this subject . Even with `gpu_memory_utilization` set to 0.3 crashes due to RAM happen. My guess is that its due to VLLM offloading (...
Still same issue , happens rarely but when it does breaks the whole process...
Did anyone got problem with merging adapters ? Because i got that on different machine, additionally the 16 Bit merged model does not work as expected neither... So must be...
@w601sxs what you can try is merge and saving with save_pretrained_merged in 16 bits , reload the model and create a new adapter . To be honest i'm not sure...
Sorry guys for disappearing, I solved it reloading the adapter without vllm and then merging worked just fine . I think with last versions must be fixed anyways. Thanks for...