musicurgy
musicurgy
> This issue is only present with GPTQ, I compared B&B 8bit with GPTQ 8bit and GPTQ was the only one with a delay. if B&B 4bit implementation (coming eventually)...
> > > This issue is only present with GPTQ, I compared B&B 8bit with GPTQ 8bit and GPTQ was the only one with a delay. > > > >...
I get a bunch of dependency errors when launching despite setting up LLaMa beforehand (definitely my own fault and probably because of a messed up conda environment) ``` ModuleNotFoundError: No...
Yeah, after a bit of a struggle I ended up getting it working by just copying all the dependencies into the webui folder. So far the model is really interesting....
I also had a problem with using 2xGPU when testing 13B 16bit llama. I have a 3090 (24GB) and 3060 (12GB). Unfortunately when using the two together for whatever reason...
> @musicurgy did you figure out how to fix that issue? or did you use another framework? im on same hardware. I'm currently using the 4bit llama model which has...
Already writing implementations for 4-bit, love it. How fast is the inference time when running llama 30B 4-bit on a 3090?