exllama issues

Issue when attempting to run exllama (P40)

3

When starting with the command 'python test_benchmark_inference.py -d /home/rexommendation/Programs/KoboldAI/models/30B-Lazarus-GPTQ4bit -p -ppl' (I keep my models in other programs) I get the following error: > Traceback (most recent call last): >...

wereretot

Issue with open_llama_3b quantization by GPTQ-for-Llama

5

```python (exllama) dungnt@symato:~/ext_hdd/repos/gau/exllama$ python test_benchmark_inference.py -d /home/dungnt/ext_hdd/repos/Nhan/GPTQ-for-LLaMa/checkpoints/open_llama_3b/ -v -ppl -- Perplexity: -- - Dataset: datasets/wikitext2_val_sample.jsonl -- - Chunks: 100 -- - Chunk size: 2048 -> 2048 -- - Chunk overlap:...

Iambestfeed

Setup package

1

Supports installing `exllama` as a package. Example usage: ``` pip install 'exllama_lib @ git+https://github.com/paolorechia/exllama@setup-package' ``` EDIT: Worth explaining how to use the installed package. Since the installation setup creates a...

paolorechia

Question: Does GPU splitting take more ram than running on a single GPU?

4

Is there any loss when splitting?

nikshepsvn

OOM/CUDA errors when running in batch mode?

4

``` import argparse import os import glob import time import subprocess from itertools import cycle from model import ExLlama, ExLlamaCache, ExLlamaConfig from tokenizer import ExLlamaTokenizer from generator import ExLlamaGenerator #...

nikshepsvn

Expected size 2048 but got size ...

``` python3 test_benchmark_inference.py -d ../data/model/ -ppl -ppl_ds datasets/wikitext2.txt -ppl_cn 40 -l 4096 -ppl_cs 4096 -ppl_ct 4096 -cpe 2 -- Perplexity: -- - Dataset: datasets/wikitext2.txt -- - Chunks: 40 -- -...

Jeduh

Cannot compile exllama_ext on ROCm

2

I am using oobabooga's webui, which includes exllama. I cloned exllama into the repositories, installed the dependencies and am ready to compile it. However, it seems like my system won't...

fgdfgfthgr-fox

Classifier-Free Guidance

8

Just a heads up on CFG, a technique in which: "Models can perform as well as a model 2x as large" at the cost of 2x the computation, but that...

ortegaalfredo

Support non-Llama architectures

exLlama saved GPTQ, I've gone from 6 token/s to over 40, thank you! Currently it's only supports Llama based models. Here's a few other promising architectures such as: MPT Falcon...

dred0n

libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

1

I get this error with exllama running elinas alpaca 4bit safetensors Previously i never got this issue, not sure if its going to impact performence or cause random crashes I...

tpfwrz

exllama
exllama copied to clipboard

Metadata

Issue when attempting to run exllama (P40)

Issue with open_llama_3b quantization by GPTQ-for-Llama

Setup package

Question: Does GPU splitting take more ram than running on a single GPU?

OOM/CUDA errors when running in batch mode?

Expected size 2048 but got size ...

Cannot compile exllama_ext on ROCm

Classifier-Free Guidance

Support non-Llama architectures

libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

← Metadata

Owner

Metadata

exllama exllama copied to clipboard

Metadata

← Metadata

Owner

Metadata

exllama
exllama copied to clipboard