Paul Richardson comments

Results 13 comments of


                                            Paul Richardson

pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...

did you install the dependencies from the requirements.txt file? https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

Once LLAMA is converted to HF format can it then be converted to numpy format just like OPT?