Srinivas Billa
Srinivas Billa
# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...
https://github.com/IST-DASLab/marlin
Hi, Im trying to run the AWQ version of Mixtral on 4xA10s. However im getting this error. Ive also tried with `--mem-frac 0.7` and still got the same error Model...
Hi, Just wanted to check. Isnt the cogvlm model actually 17b params. Not 30? Thanks
Is it possible to add a way to generate multiple drafts for a given input. And then based on what the user picks save that data so that it can...
Hi, I've been trying to apply LoRA to the VITS model (hence the pull request for the conv1d). Turns out just using Lora for the text encoder transformer isn't enough,...
Hi, Sorry if this is stupid question but, is it possible to use the 8bit galore optimiser in combination with LoRA adapters? Thanks
Wanted to make an issue for this instead of constantly asking in discord. I saw the other ticket for multigpu fp16 training which is also nice. But ddp would let...
Splitting hot and cold neurons across cpu and gpu allows faster Inference when using larger models/higher quantisations. Demo shows 11x speedup over llama.cpp when using a 40b on a single...
Related to #1194 , using packing deteriorated performance as samples in my dataset are not independent. And the correlation might have caused the issue. However packing did help my training...