exllama
exllama copied to clipboard
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
HuggingFace -> Hugging Face
I have the same problem either in ooba or directly cloning this rep. I'm not sure what is causing this. It seems the prompt is read fast but there is...
Using text-gen webui with Exllama loader gives me different results than with Exllama_HF. Specifically, Exllama_HF gives gibberish with SuperHOT 8K models past 2048 tokens. Even the logits of the two...
https://kaiokendev.github.io/til#extending-context-to-8k Someone had the clever idea of scaling the positional embeddings inversely proportional to the extended context length. Adding ``` self.scale = 1 / 2 t *= self.scale ``` after...
Have anyone had sucsess compiling on SageMaker? There is probably a lot more for me to explore, but just wanted to check if anyone has faced the same issues I...
Sorry, I'm a little confused. It seems that the project is unable to load LORA trained by the autogptq project. However, it can load LORA trained by alpaca-lora-4bit. Here's my...
New functions: - changing color for session - add italic and bold text support - left and right columns can be hidden - regenerate message (any or last) - continue...
Hi, thanks a lot for this project, really interesting! I'm interested in trying to hook it up with the guidance library (https://github.com/microsoft/guidance). Before I attempt any coding though, would be...
I think adding this as an example makes the most sense, this is a relatively complete example of a conversation model setup using Exllama and langchain. I've probably made some...
Hi, really nice work here! I really appreciate it that you bring llama inference to consumer grade GPUs!! There is an ongoing project https://github.com/openlm-research/open_llama which seems to have a lot...