pyllama
pyllama copied to clipboard
LLaMA: Open and Efficient Foundation Language Models
Could you put the actual text for command to run inference with Quantization? I cannot see the image because I'm blind and uses screen reader. Readme says "With quantization, you...
I have quantized the 13B model to 2bit by executing: `python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt` After the quantization when I run the test inference the output...
pyllama-7B2B.pt download link https://pan.baidu.com/s/1zOdKOHnSCsz6TFix2NTFtg Tries to get me to download an executable called BaiduNetdisk_7.26.0.10.exe
I was able to convert LLaMA weights, quantize, and inference using qwopqwop200/GPTQ-for-LLaMa. However, I can't load it using Pyllama. Thanks! 1. Clone: `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda` 2. Install: `pip...
Is there a way to skip evaluation after quantizing? It just takes forever on Colab!
Hi, I'm interested to try running llama models, I'm using a macbook with AMD GPU, so probably easiest would be to use CPU. Would be nice to know if it's...
It's just so convenient to be able to auto-format code.
Has anyone got this dockerized for easy install?