exllama icon indicating copy to clipboard operation
exllama copied to clipboard

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Results 99 exllama issues
Sort by recently updated
recently updated
newest added

Hello, I have a server with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and 5x WX9100 and want to run Mistral 7b on each GPU. But I received an error:...

can someone help me with this error please Traceback (most recent call last): File "C:\Users\cheth\Music\new chaya\OneReality\OneRealityMemory.py", line 68, in ExLlamatokenizer = ExLlamaV2Tokenizer(config) File "C:\Python310\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 192, in __init self.eos_token =...

Here's another bug on Oobabooga's project that is unresolved... https://github.com/oobabooga/text-generation-webui/issues/2923 I realized that the ExLlama team may have a solution.... So thought I'd cross post this issue on this project,...

Thank you for your work... as I've not seen this mentioned I thought I would post, in the hopes that this will save others frustration and support the work. I...

Getting this on inference when I have a lora loaded (loading the lora itself doesn't produce any errors). Using text-generation-webui. `File "/home/user/text-generation-webui/modules/models.py", line 309, in clear_torch_cache torch.cuda.empty_cache() File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/memory.py", line...

I am seeing suboptimal output when running on a 2080 ti compared to running on an A100. 1) When running python example_basic.py with Neko-Institute-of-Science/LLaMA-7B-4bit-128g I get this: Using a 2080...

Hi, I'm curious if it's possible to load a model if you don't have enough system ram, but enough vram. I got 32gb of system ram and 48gb of vram,...

Does it support safetytensor formate?>

Hello! I am trying to use beam search while doing inference on my GPTQ quantized 4-bit Llama model whose base model is `daekeun-ml/Llama-2-ko-instruct-13B`. I got an error like this: ```bash...