pyllama issues

Results 69 pyllama issues

Sort by recently updated

Apply Delta failed

I downloaded pyllama. Converted it to huggingface format. and when I run the following command `python3 -m fastchat.model.apply_delta --base converted7B/ --target llama_to_vicuna --delta lmsys/vicuna-7b-delta-v1.1` the following error is raised. `...

majidbhatti

already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

the error is as the follow： python webapp_single.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH Traceback (most recent call last): File "/home/xxxx/chatllama/pyllama/apps/gradio/webapp_single.py", line 80, in generator = load( File "/home/u/chatllama/pyllama/apps/gradio/webapp_single.py", line 42, in...

elven2016

Can't see progress bar

rahulvigneswaran

How to run 13B model in a single GPU just by inference.by?

statyui

Quantization with "groupsize" makes the results completely wrong.

Hi, I'm quantizing the models following the README but there's one common thing while using the `groupsize` parameter - in each case the perplexity goes to the roof and the...

daniel-kukiela

Any way to infer a quantized model on multi GPUs?

Imagium719

Killed

Hello all, I installed the requirements of project but when I try to execute the following command: python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt I got this message...

javierp183

a questuon about the single GPU Inference

Thanks for this great job and i'm wondering how to run inference in a 8GB single GPU,like your example showing in the readme. I tried it in my RTX2080ti with...

TitleZ99

Inference Error :UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 18: invalid continuation byte"

Seems like the inference script not working with Chinese characters sometimes `Prompt: ['what are you taking about'] 🦙LLaMA: what are you taking about？什么地方有某种事物？... please enter your prompts (Ctrl+C to...

MaiziXiao

quantify llama 7B, the md5 value and the model size does not equals to the value in README

Here's the data in README: ![image](https://user-images.githubusercontent.com/57667856/230556706-c0476743-ff8e-4a5f-b517-5099f5439dd6.png) And this is the md5 value of my model: this is the size of my model: Inevitably, the output of the model is totally...

balcklive

pyllama
pyllama copied to clipboard

Metadata

Apply Delta failed

already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

Can't see progress bar

How to run 13B model in a single GPU just by inference.by?

Quantization with "groupsize" makes the results completely wrong.

Any way to infer a quantized model on multi GPUs?

Killed

a questuon about the single GPU Inference

Inference Error :UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 18: invalid continuation byte"

quantify llama 7B, the md5 value and the model size does not equals to the value in README

← Metadata

Owner

Metadata

pyllama pyllama copied to clipboard

Metadata

← Metadata

Owner

Metadata

pyllama
pyllama copied to clipboard