exllama
exllama copied to clipboard
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
When starting with the command 'python test_benchmark_inference.py -d /home/rexommendation/Programs/KoboldAI/models/30B-Lazarus-GPTQ4bit -p -ppl' (I keep my models in other programs) I get the following error: > Traceback (most recent call last): >...
```python (exllama) dungnt@symato:~/ext_hdd/repos/gau/exllama$ python test_benchmark_inference.py -d /home/dungnt/ext_hdd/repos/Nhan/GPTQ-for-LLaMa/checkpoints/open_llama_3b/ -v -ppl -- Perplexity: -- - Dataset: datasets/wikitext2_val_sample.jsonl -- - Chunks: 100 -- - Chunk size: 2048 -> 2048 -- - Chunk overlap:...
Supports installing `exllama` as a package. Example usage: ``` pip install 'exllama_lib @ git+https://github.com/paolorechia/exllama@setup-package' ``` EDIT: Worth explaining how to use the installed package. Since the installation setup creates a...
Is there any loss when splitting?
``` import argparse import os import glob import time import subprocess from itertools import cycle from model import ExLlama, ExLlamaCache, ExLlamaConfig from tokenizer import ExLlamaTokenizer from generator import ExLlamaGenerator #...
``` python3 test_benchmark_inference.py -d ../data/model/ -ppl -ppl_ds datasets/wikitext2.txt -ppl_cn 40 -l 4096 -ppl_cs 4096 -ppl_ct 4096 -cpe 2 -- Perplexity: -- - Dataset: datasets/wikitext2.txt -- - Chunks: 40 -- -...
I am using oobabooga's webui, which includes exllama. I cloned exllama into the repositories, installed the dependencies and am ready to compile it. However, it seems like my system won't...
Just a heads up on CFG, a technique in which: "Models can perform as well as a model 2x as large" at the cost of 2x the computation, but that...
exLlama saved GPTQ, I've gone from 6 token/s to over 40, thank you! Currently it's only supports Llama based models. Here's a few other promising architectures such as: MPT Falcon...
I get this error with exllama running elinas alpaca 4bit safetensors Previously i never got this issue, not sure if its going to impact performence or cause random crashes I...