eval-dev-quality icon indicating copy to clipboard operation
eval-dev-quality copied to clipboard

Better prompting with templates: Setting a mandatory document start

Open rnbwdsh opened this issue 10 months ago • 0 comments

Reason for this PR: My first test runs with ollama - especially with the speedy 1.6 and 3b models was rather bad, because it threw in a lot of explanation text before the ``` - even if it's not explicitly an -instruct model that is fine-tuned for replying to chat-style conversations.

Most models have a suggested "template" which is used with template-string-replace to insert a system instruction (give the model a role, i.e. you are a senior go tester and you reply with tests to classes) that is typically separate form the actual prompt.

Here are some examples for popular code models:

  • https://ollama.com/library/mistral:latest/blobs/e6836092461f
  • https://ollama.com/library/codellama:latest/blobs/2e0493f67d0c
  • https://ollama.com/library/dolphin-mistral:latest/blobs/62fbfd9ed093

As ollama is already in go, these templates should be proper go template strings like you already use - to my understanding.

You can fetch it via /api/show?name=modelname in ollama (see referenced api doc), and it's in the doc for most proprietary models if you use the "generate" and not the "chat" endpoint. A lot of coding models also have special "fill in the middle" templates.

This is what's piped into the LLM / tokenizer raw. If you append to this template, you can make the model complete from

package plain

import {....}

func

And the model will know that it's already done with the markdown prefix, the package definition, imports and it shouldn't use some class-based tests (that some models did for me)

However, having a prompt that already contains a mandatory header means there is slightly more/different postprocessing and you probably have to do a slightly different call-logic for the openrouter-stuff. Or the fetch-function returns the suggested document_start + generated code.

A suggested start thing would not only increase performance, it would also reduce token cost by a bit because the model has to generate less.

rnbwdsh avatar Apr 11 '24 11:04 rnbwdsh