musicurgy comments

Results 7 comments of


                                            musicurgy

Strange generation speed

> This issue is only present with GPTQ, I compared B&B 8bit with GPTQ 8bit and GPTQ was the only one with a delay. if B&B 4bit implementation (coming eventually)...

Strange generation speed

> > > This issue is only present with GPTQ, I compared B&B 8bit with GPTQ 8bit and GPTQ was the only one with a delay. > > > >...

Support for LLaMA models

I get a bunch of dependency errors when launching despite setting up LLaMa beforehand (definitely my own fault and probably because of a messed up conda environment) ``` ModuleNotFoundError: No...

Support for LLaMA models

Yeah, after a bit of a struggle I ended up getting it working by just copying all the dependencies into the webui folder. So far the model is really interesting....

Using 2xGPU but most inference load is lopsided to 1 GPU

I also had a problem with using 2xGPU when testing 13B 16bit llama. I have a 3090 (24GB) and 3060 (12GB). Unfortunately when using the two together for whatever reason...

Using 2xGPU but most inference load is lopsided to 1 GPU

> @musicurgy did you figure out how to fix that issue? or did you use another framework? im on same hardware. I'm currently using the 4bit llama model which has...

GPTQ quantization(3 or 4 bit quantization) support for LLaMa

Already writing implementations for 4-bit, love it. How fast is the inference time when running llama 30B 4-bit on a 3090?