exllama icon indicating copy to clipboard operation
exllama copied to clipboard

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Results 99 exllama issues
Sort by recently updated
recently updated
newest added

Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/modules/callbacks.py", line 56, in gentask ret = self.mfunc(callback=_callback, *args, **self.kwargs) File "/home/ubuntu/text-generation-webui/modules/text_generation.py", line 361, in generate_with_callback shared.model.generate(**kwargs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return...

I'm sorry I am unable to find relevant doc on Internet on how to load all modules on GPU. I got this error message from my code: ``` Found modules...

I am getting this error when trying to do inference with CodeLLaMA34B from The-Bloke + a LoRA trained on the same model using alpaca_lora_4bit. Commenting out the generator.lora line works....

Baichuan2 is a new model tha has better overall results compared to the llama. `https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat` It works with AutoGPTQ, but I encountered some error with exllama. ![3997267f0ea6fc9c7992296e90da7bf5](https://github.com/turboderp/exllama/assets/397630/4a49b9ff-2b03-442c-bd05-da9d376e735e)

Hey! This is the correct LLama 2 Chat prompt formatting implementation into `example_llama2chat.py`. This PR uses https://github.com/turboderp/exllama/pull/195 to copy the exact implementation of the [original Llama repo](https://github.com/facebookresearch/llama/) The format for...

I am getting this on my Mac M1 (Ventura 13.5.2) with Python 3.11.5: ``` Traceback (most recent call last): File "/Users/user/code/project/text-generation-webui/server.py", line 29, in from modules import ( File "/Users/user/code/project/text-generation-webui/modules/ui_default.py",...

I'm writing a production server to handle requests from a large number of clients rotating. I have a custom manager class that handles everything, but I'm hoping to keep the...

Hello! Been experiencing a problem in exllama (both versions) with this particular model. The model only outputs '\n' when used with exllama. I've came across this problem two different ways:...

followed instructions with error generated below: ``` python3 example_chatbot.py -d models/airoboros/model.safetensors -un "Jeff" -p prompt_chatbort.txt Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line...

Hello everyone Im trying to setup exllama in an Azure ML compute and I followed the instructions here https://github.com/turboderp/exllama, but unfortunately Im getting an error when trying to call this...