Paul Richardson
Results
13
comments of
Paul Richardson
pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...
did you install the dependencies from the requirements.txt file? https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode
Once LLAMA is converted to HF format can it then be converted to numpy format just like OPT?