RedDragonGecko
RedDragonGecko
I had a similar issue with a different model and renaming it and adding -4bit on the end worked for me. so llama7b-4bit-128g-4bit.pt or it could be whatever it is...
So how might this be implemented for oggabooga's web text gen? Explain to me like I'm 5.
I'm also getting RuntimeError: CUDA error: an illegal memory access was encountered I'm also using 2x 3090 Using oogabooga webui, latest with exllama, latest and sillytavern, latest. Happens randomly during...
(within oogabooga webui) I have set the split to 16,23 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile...
Doing further testing within oobabooga webui I have gotten the (I think the same) error to occur again right after this error: key_states = cache.key_states[self.index].narrow(2, 0, past_len + q_len) RuntimeError:...
cache = ExLlamaCache(model, max_seq_len = 2056) makes the model output pure giberish. Even just setting it to 2049 makes it hallucinate badly. With further testing I believe context going over...