exllama
exllama copied to clipboard
Support non-Llama architectures
exLlama saved GPTQ, I've gone from 6 token/s to over 40, thank you! Currently it's only supports Llama based models.
Here's a few other promising architectures such as: MPT Falcon SalesForce StarCoder ChatGPT
Are there plans to support these other architectures?