exllama issues

Update model_compatibility.md

1

HuggingFace -> Hugging Face

Why is there a huge lag between reading the prompt and starting to generate output?

6

I have the same problem either in ooba or directly cloning this rep. I'm not sure what is causing this. It seems the prompt is read fast but there is...

ENjoyBlue2021

Strange behavior with caching on 8K models

2

Using text-gen webui with Exllama loader gives me different results than with Exllama_HF. Specifically, Exllama_HF gives gibberish with SuperHOT 8K models past 2048 tokens. Even the logits of the two...

kaiokendev

Interesting method to extend a model's max context length.

49

https://kaiokendev.github.io/til#extending-context-to-8k Someone had the clever idea of scaling the positional embeddings inversely proportional to the extended context length. Adding ``` self.scale = 1 / 2 t *= self.scale ``` after...

allenbenz

Compiling issue on Sagemaker

6

Have anyone had sucsess compiling on SageMaker? There is probably a lot more for me to explore, but just wanted to check if anyone has faced the same issues I...

buzzCraft

not support lora with autogptq/peft?

5

Sorry, I'm a little confused. It seems that the project is unable to load LORA trained by the autogptq project. However, it can load LORA trained by alpaca-lora-4bit. Here's my...

laoda513

More functions in webui, interface is more adapted to mobile

1

New functions: - changing color for session - add italic and bold text support - left and right columns can be hidden - regenerate message (any or last) - continue...

CORRUPTOR2037

Integrating with Guidance: adding a positive bias to certain tokens

5

Hi, thanks a lot for this project, really interesting! I'm interested in trying to hook it up with the guidance library (https://github.com/microsoft/guidance). Before I attempt any coding though, would be...

paolorechia

Added streaming langchain example.

2

I think adding this as an example makes the most sense, this is a relatively complete example of a conversation model setup using Exllama and langchain. I've probably made some...

CoffeeVampir3

openllama support

4

Hi, really nice work here! I really appreciate it that you bring llama inference to consumer grade GPUs!! There is an ongoing project https://github.com/openlm-research/open_llama which seems to have a lot...

cnut1648

exllama
exllama copied to clipboard

Metadata

Update model_compatibility.md

Why is there a huge lag between reading the prompt and starting to generate output?

Strange behavior with caching on 8K models

Interesting method to extend a model's max context length.

Compiling issue on Sagemaker

not support lora with autogptq/peft?

More functions in webui, interface is more adapted to mobile

Integrating with Guidance: adding a positive bias to certain tokens

Added streaming langchain example.

openllama support

← Metadata

Owner

Metadata

exllama exllama copied to clipboard

Metadata

← Metadata

Owner

Metadata

exllama
exllama copied to clipboard