Results 35 issues of Srinivas Billa

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged...

enhancement

https://github.com/IST-DASLab/marlin

Hi, Im trying to run the AWQ version of Mixtral on 4xA10s. However im getting this error. Ive also tried with `--mem-frac 0.7` and still got the same error Model...

bug

Hi, Just wanted to check. Isnt the cogvlm model actually 17b params. Not 30? Thanks

Is it possible to add a way to generate multiple drafts for a given input. And then based on what the user picks save that data so that it can...

enhancement
front
back

Hi, I've been trying to apply LoRA to the VITS model (hence the pull request for the conv1d). Turns out just using Lora for the text encoder transformer isn't enough,...

Hi, Sorry if this is stupid question but, is it possible to use the 8bit galore optimiser in combination with LoRA adapters? Thanks

Wanted to make an issue for this instead of constantly asking in discord. I saw the other ticket for multigpu fp16 training which is also nice. But ddp would let...

currently fixing

Splitting hot and cold neurons across cpu and gpu allows faster Inference when using larger models/higher quantisations. Demo shows 11x speedup over llama.cpp when using a 40b on a single...

feature request

Related to #1194 , using packing deteriorated performance as samples in my dataset are not independent. And the correlation might have caused the issue. However packing did help my training...

enhancement
help wanted
good second issue