Dougie777
Dougie777
I dont have time to fork this so I will just post the fixed code here and a link to a working jsfiddle. UPDATE. I added the ability to put...
For example openchat 3.5 wants this prompt template format: GPT4 User: {prompt}GPT4 Assistant: I tried a few things a managed to crash the server so I am stuck. Can anyone...
Could there be some new format of gguf that we need to update the code for or something?
Long generations dont return data but server says 200 OK. Swagger screen just says LOADING forever.
How to reproduce: **1) Model being used:** wizardlm_70b_q4_gguf = LlamaCppModel( model_path="wizardlm-70b-v1.0.Q4_K_M.gguf", # manual download max_total_tokens=4096, use_mlock=False, ) **2) From swagger run this query against the chat completion endpoint. Please note...
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 The exact same settings and quantization works for 7B and 13B. Here is...