exllama issues

Run on CPU without AVX2

3

Hello, I have a server with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and 5x WX9100 and want to run Mistral 7b on each GPU. But I received an error:...

ZanMax

can someone help me with this error please Traceback (most recent call last): File "C:\Users\cheth\Music\new chaya\OneReality\OneRealityMemory.py", line 68, in ExLlamatokenizer = ExLlamaV2Tokenizer(config) File "C:\Python310\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 192, in __init self.eos_token =...

chethanwiz

Multi-GPU issues

9

Here's another bug on Oobabooga's project that is unresolved... https://github.com/oobabooga/text-generation-webui/issues/2923 I realized that the ExLlama team may have a solution.... So thought I'd cross post this issue on this project,...

nktice

updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

2

Thank you for your work... as I've not seen this mentioned I thought I would post, in the hopes that this will save others frustration and support the work. I...

nktice

When will the bfloat16 type of GPTQ algorithm be supported?

Kelang-Tian

Illegal memory access when using a lora

32

Getting this on inference when I have a lora loaded (loading the lora itself doesn't produce any errors). Using text-generation-webui. `File "/home/user/text-generation-webui/modules/models.py", line 309, in clear_torch_cache torch.cuda.empty_cache() File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/memory.py", line...

ipb26

Bad output for 2080 ti

2

I am seeing suboptimal output when running on a 2080 ti compared to running on an A100. 1) When running python example_basic.py with Neko-Institute-of-Science/LLaMA-7B-4bit-128g I get this: Using a 2080...

filipemesquita

Possible to load model with low system ram?

4

Hi, I'm curious if it's possible to load a model if you don't have enough system ram, but enough vram. I got 32gb of system ram and 48gb of vram,...

gros87

Does it support safetytensor formate?>

lucasjinreal

Error when using Beam Search

Hello! I am trying to use beam search while doing inference on my GPTQ quantized 4-bit Llama model whose base model is `daekeun-ml/Llama-2-ko-instruct-13B`. I got an error like this: ```bash...

bibekyess

exllama
exllama copied to clipboard

Metadata

Run on CPU without AVX2

piece id is out of range

Multi-GPU issues

updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

When will the bfloat16 type of GPTQ algorithm be supported?

Illegal memory access when using a lora

Bad output for 2080 ti

Possible to load model with low system ram?

Does it support safetytensor formate?>

Error when using Beam Search

← Metadata

Owner

Metadata

exllama exllama copied to clipboard

Metadata

← Metadata

Owner

Metadata

exllama
exllama copied to clipboard