continue icon indicating copy to clipboard operation
continue copied to clipboard

Chat template for Codellama70b? Getting terrible and off-topic output compared to web-hosted Codellama70b

Open ewebgh33 opened this issue 1 year ago • 5 comments

Validations

  • [X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • [X] I'm not able to find an open issue that requests the same enhancement

Problem

I've been testing Codellama:70b-instruct in Continue the past day and half.

Rolled back to 0.8.4 as I get no output from 0.8.5 onwards using Ollama.

Anyway I am getting a lot of terrible results - especially compared to entering the same stuff into Perplexity Labs Codellama70b (which allows people to test it free).

Now I've read that apparently there is a prompt issue? As detailed here: https://www.reddit.com/r/LocalLLaMA/comments/1afweyw/quick_headsup_about_using_codellama_70b_and/

Trying to work out what my issue is. Is the prompt template format why I am getting terrible and off-topic output via Codellama70b in Continue, vs using other implementations? If so, what can we do about it?

Thanks

Solution

No response

ewebgh33 avatar Feb 01 '24 01:02 ewebgh33

@EmmaWebGH I just became aware of the prompting issues this afternoon and they have been solved in 0.8.7. I tested with both the "free-trial" and "together" providers.

If you're seeing template-related problems while using Ollama though, this problem might be on the Ollama side, as we rely on them to format messages. I haven't had the chance to test without having a GPU large enough for CodeLlama-70b, so let me know and I could reach out to them

sestinj avatar Feb 01 '24 03:02 sestinj

Great, I will update when I can! You really are on top of everything. Thankyou for such a useful extension!

ewebgh33 avatar Feb 01 '24 03:02 ewebgh33

I've met the same issue. I started codellama 70b instruct gguf on my mac m1 studio by llama.cpp server like this:

../llama.cpp/server -m ./codellama-70b-instruct.Q5_K_S.gguf -np 2 -c 4096 --host 0.0.0.0 --port 8080

And configured the model in config.json

    {
      "title": "codellama-70b",
      "model": "codellama-70b",
      "completionOptions": {},
      "contextLength": 4096,
      "provider": "llama.cpp",
      "apiBase": "http://dev.myserver.com:8080" 
    },

On the VS Code Continue plugin, it keeps output lots of code and messages without stopping. I wonder how could set the proper prompt template, and the stop token.

For huggingface chat-ui, the following config works:

{
	"name": "codellama-70b-llamacpp",
	"chatPromptTemplate" : "<s>{{#if @root.preprompt}}Source: system\n\n {{@root.preprompt}} <step> {{/if}}{{#each messages}}{{#ifUser}}Source: user\n\n {{content}} <step> {{/ifUser}}{{#ifAssistant}}Source: assistant\n\n {{content}} <step> {{/ifAssistant}}{{/each}}Source: assistant\nDestination: user\n\n ",
	
	"parameters": {
		"temperature": 0.5,
		"top_p": 0.95,
		"repetition_penalty": 1.2,
		"top_k": 50,
		"truncate": 3072,
		"max_new_tokens": 2048,
		"stop" : ["<step>", "Source: assistant"]
	},
	
	"endpoints": [{
		"type": "openai",
		"baseURL": "http://dev.myserver.com:8080/v1"
	
	}]
}

davideuler avatar Feb 12 '24 15:02 davideuler

@davideuler Everything in your config looks right and seems to indicate that the prompt should be correctly set. here is the code where we format the prompt for codellama-70b. You could double check that the correct formatting is being sent by going to the "Output" tab in the bottom bar of VS Code (next to terminal) and then selecting "Continue - ..." in the dropdown on the right. It shows all raw prompt/completions

If this looks correct, then perhaps there might be a bad interaction with the server (e.g. it also formats the prompt, leading to it happening twice)

sestinj avatar Feb 19 '24 18:02 sestinj

@davideuler Everything in your config looks right and seems to indicate that the prompt should be correctly set. here is the code where we format the prompt for codellama-70b. You could double check that the correct formatting is being sent by going to the "Output" tab in the bottom bar of VS Code (next to terminal) and then selecting "Continue - ..." in the dropdown on the right. It shows all raw prompt/completions

If this looks correct, then perhaps there might be a bad interaction with the server (e.g. it also formats the prompt, leading to it happening twice)

Sestinj, Thanks, I've checked the output in VS Code. The request which was sent to llama.cpp is ok.

And when I am with the latest version of continue plugin, it shows me response related to the code and lots of apologizing message like "I apologize, but as a responsible AI language model".

davideuler avatar Feb 22 '24 03:02 davideuler

Ok, sounds like this is about a typical response for CodeLlama-70b : ) which means that this should be resolved. Let me know if anything else comes up!

sestinj avatar Mar 20 '24 23:03 sestinj