exllama icon indicating copy to clipboard operation
exllama copied to clipboard

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Results 99 exllama issues
Sort by recently updated
recently updated
newest added

When starting with the command 'python test_benchmark_inference.py -d /home/rexommendation/Programs/KoboldAI/models/30B-Lazarus-GPTQ4bit -p -ppl' (I keep my models in other programs) I get the following error: > Traceback (most recent call last): >...

```python (exllama) dungnt@symato:~/ext_hdd/repos/gau/exllama$ python test_benchmark_inference.py -d /home/dungnt/ext_hdd/repos/Nhan/GPTQ-for-LLaMa/checkpoints/open_llama_3b/ -v -ppl -- Perplexity: -- - Dataset: datasets/wikitext2_val_sample.jsonl -- - Chunks: 100 -- - Chunk size: 2048 -> 2048 -- - Chunk overlap:...

Supports installing `exllama` as a package. Example usage: ``` pip install 'exllama_lib @ git+https://github.com/paolorechia/exllama@setup-package' ``` EDIT: Worth explaining how to use the installed package. Since the installation setup creates a...

``` import argparse import os import glob import time import subprocess from itertools import cycle from model import ExLlama, ExLlamaCache, ExLlamaConfig from tokenizer import ExLlamaTokenizer from generator import ExLlamaGenerator #...

``` python3 test_benchmark_inference.py -d ../data/model/ -ppl -ppl_ds datasets/wikitext2.txt -ppl_cn 40 -l 4096 -ppl_cs 4096 -ppl_ct 4096 -cpe 2 -- Perplexity: -- - Dataset: datasets/wikitext2.txt -- - Chunks: 40 -- -...

I am using oobabooga's webui, which includes exllama. I cloned exllama into the repositories, installed the dependencies and am ready to compile it. However, it seems like my system won't...

Just a heads up on CFG, a technique in which: "Models can perform as well as a model 2x as large" at the cost of 2x the computation, but that...

exLlama saved GPTQ, I've gone from 6 token/s to over 40, thank you! Currently it's only supports Llama based models. Here's a few other promising architectures such as: MPT Falcon...

I get this error with exllama running elinas alpaca 4bit safetensors Previously i never got this issue, not sure if its going to impact performence or cause random crashes I...