RedDragonGecko comments

Results 6 comments of


                                            RedDragonGecko

trafficstars

Could not find the quantized model in .pt or .safetensors format, exiting...

I had a similar issue with a different model and renaming it and adding -4bit on the end worked for me. so llama7b-4bit-128g-4bit.pt or it could be whatever it is...

5x faster text generation on multi-GPU setups (+ lower VRAM consumption)

So how might this be implemented for oggabooga's web text gen? Explain to me like I'm 5.

RuntimeError: CUDA error: an illegal memory access was encountered

I'm also getting RuntimeError: CUDA error: an illegal memory access was encountered I'm also using 2x 3090 Using oogabooga webui, latest with exllama, latest and sillytavern, latest. Happens randomly during...

RuntimeError: CUDA error: an illegal memory access was encountered

(within oogabooga webui) I have set the split to 16,23 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile...

RuntimeError: CUDA error: an illegal memory access was encountered

Doing further testing within oobabooga webui I have gotten the (I think the same) error to occur again right after this error: key_states = cache.key_states[self.index].narrow(2, 0, past_len + q_len) RuntimeError:...

RuntimeError: CUDA error: an illegal memory access was encountered

cache = ExLlamaCache(model, max_seq_len = 2056) makes the model output pure giberish. Even just setting it to 2049 makes it hallucinate badly. With further testing I believe context going over...