pyllama issues

Results 69 pyllama issues

Sort by recently updated

Readme Should Have Inference Command to use for Quantization in Text

Could you put the actual text for command to run inference with Quantization? I cannot see the image because I'm blind and uses screen reader. Readme says "With quantization, you...

chigkim

Meaningless Prediction in 13B 2bit

I have quantized the 13B model to 2bit by executing: `python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt` After the quantization when I run the test inference the output...

axenov

Quantized version link suspect

pyllama-7B2B.pt download link https://pan.baidu.com/s/1zOdKOHnSCsz6TFix2NTFtg Tries to get me to download an executable called BaiduNetdisk_7.26.0.10.exe

thistleknot

Can't Load Quantized Model with GPTQ-for-LLaMa

I was able to convert LLaMA weights, quantize, and inference using qwopqwop200/GPTQ-for-LLaMa. However, I can't load it using Pyllama. Thanks! 1. Clone: `git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b cuda` 2. Install: `pip...

chigkim

Is there a way to skip evaluating after quantizing because it takes forever?

Is there a way to skip evaluation after quantizing? It just takes forever on Colab!

chigkim

Document if it works with CPU / Macos

Hi, I'm interested to try running llama models, I'm using a macbook with AMD GPU, so probably easiest would be to use CPU. Would be nice to know if it's...

ikamensh