exllama Codellama support

trafficstars

exllama/model.py", line 45, in init self.pad_token_id = read_config["pad_token_id"] KeyError: 'pad_token_id'

Aug 25 '23 08:08 lucasjinreal

Just add this to the config.json

"pad_token_id": 0,

Aug 25 '23 11:08 dred0n

Just add this to the config.json

"pad_token_id": 0, Where is the config.json?

Aug 26 '23 20:08 ShahZ181

It's the config.json that should be part of your files: https://huggingface.co/TheBloke/CodeLlama-13B-Python-GPTQ/tree/main

So did anyone managed to get coherent sentences out of the model yet? It barely acknowledges my questions.

Aug 26 '23 21:08 pan324

So did anyone managed to get coherent sentences out of the model yet? It barely acknowledges my questions.

I have tried the Phind-CodeLlama-34B on example-chatbot.py and output is really bad and repeats words endlessly. I have read that people have gotten it to work so maybe its an exllama issue idk. I am new to all of this

I also tried the new WizardCoder-Python-34B but it gives me this error: with safe_open(self.config.model_path, framework = "pt", device = "cpu") as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

Aug 26 '23 21:08 ShahZ181

WizardCoder-Python-34B works well for me. All the other TheBloke models seem defective.

Aug 27 '23 00:08 dred0n

I also tried the new WizardCoder-Python-34B but it gives me this error: with safe_open(self.config.model_path, framework = "pt", device = "cpu") as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

I fixed this issue by deleting the model and downloading it again. And i can confirm WizardCoder Python is the only one that work well so far for me

Aug 27 '23 00:08 ShahZ181

@dred0n I think you were right.

especially these quantized models. (might mainly caused by quantize).

Did u tested WizardCoder-34B with quantize and exllama??

Aug 27 '23 04:08 lucasjinreal

@dred0n Hi, can u share your Wizardcoder34B quantized model? GPTQ?

Aug 27 '23 13:08 lucasjinreal

@lucasjinreal Yes, it works well. I'm using TheBloke's WizardCoder-34B and the results are the same as like the Demo WizardLM put up.

Aug 27 '23 17:08 dred0n

@dred0n how about the quantized model? What inference framework used here? exllama or llama.cpp or hf?

Aug 28 '23 02:08 lucasjinreal

exllama exllama copied to clipboard

Codellama support

exllama
exllama copied to clipboard