opik
opik copied to clipboard
[FR]: optimizing prompts for custom models
Proposal summary
I want to optimize prompts for som ecustom model which returns structured response e.g.:
class ResponseSchema(BaseModel): query: List[str] = Field( ..., description="Some list" )
class CustomModel(OpikBaseModel): def init(self, MODEL:str, TEMP:float, CONTEXT:int, Response:BaseModel): super().init(model_name = None) self.MODEL_NAME = MODEL self.TEMPERATURE = TEMP self.CONTEXT_LENGTH = CONTEXT self.Response = Response
def generate_provider_response(self, **kwargs: Any) -> str:
pass
def agenerate_provider_response(self, **kwargs: Any) -> str:
pass
def agenerate_provider_response_stream(self, **kwargs: Any) -> str:
pass
def generate_string(self, input: str, **kwargs: Any) -> str:
"""Simplified interface to generate a string output from the model."""
#print(input)
response = ollama.chat(
messages=[
{
'role': 'user',
'content': input,
}
],
model = self.MODEL_NAME,
format = self.Response.model_json_schema(),
options = {"context_length": self.CONTEXT_LENGTH,"temperature":self.TEMPERATURE}
)
return response["message"]["content"]
def agenerate_string(self, input: str, **kwargs: Any) -> str:
pass
def agenerate_prompt(self, input: str, **kwargs: Any) -> str:
pass
my_custom_model = CustomModel('qwen3:235b-a22b',0,4096,ResponseSchema)
Currently, only strings can be passed to optimizer constructors, like:
optimizer = FewShotBayesianOptimizer( project_name = SOME_PROJECT_NAME, model = "openai/gpt-4o-mini", temperature=0.1, max_tokens=5000, )
while desired option would be like that:
optimizer = FewShotBayesianOptimizer( project_name = SOME_PROJECT_NAME, model = my_custom_model )
Motivation
Structured output is currently a standard. When passing only a string with model name, it is not possible, using prompt only, to force a model to return structured output in all cases. For this reason application of opik prompt optimization in its current version is seriously limited.
Hi @taborzbislaw
Great suggestion, we support this in evaluations but not in the optimizer - I'll add it today
Hi,
thank you very much for your prompt answer. I am using custom models in evaluations and it works very well. Just one more question. When optimizing prompts I prepare TaskConfig like:
task_config = TaskConfig(
instruction_prompt=initial_prompt,
input_dataset_fields=["some_query"],
output_dataset_field="expected_output",
use_chat_prompt=True,
)
where initial_prompt must be string.
Optimization of prompt templates would be also a required functionality, extending applications of opik oprimizers. I am thinking about something like:
client = opik.Opik()
prompt_template = client.get_prompt(name="prompt_template")
prompt template could be like:
Here is a question: {{question}}.
Here is some context {{context}}
Answer the question given the context
task_config = TaskConfig(
instruction_prompt = prompt_template,
prompt_input_dataset_fields = {"question": "some_dataset_field1", "context":"some_dataset_field2"}
output_dataset_field="expected_output",
)
So, again, things like the ones above are available for evaluations so extending them to optimizers would be really valuable.
hello any update not this?
Hello @teenaxta and @taborzbislaw,
We are busy designing updates to the Opik optimizer system that will allow such variations. This is a big project; stay tuned!
Hi @teenaxta, @taborzbislaw
We've finished the refactoring and now support optimizing structured outputs, I've updated the docs here: https://www.comet.com/docs/opik/agent_optimization/opik_optimizer/models#customizing-the-llm-response
Let me know if you have any feedback
Hi,
can you, please, give some working example of optimization using structured output? There is no complete code in opik webpage so I was trying to compose thr code from snippets at different pages. I tried code like this (with a few variants):
###########################################
from pydantic import BaseModel
from openai import OpenAI
from opik_optimizer import ChatPrompt
from opik.evaluation.metrics import LevenshteinRatio
from opik_optimizer import MetaPromptOptimizer, ChatPrompt
from opik_optimizer.datasets import tiny_test
import opik_optimizer
client = OpenAI()
class AnswerModel(BaseModel):
answer: str
def invoke_llm(model, messages, tools):
completion = client.chat.completions.parse(
model=model,
messages=messages,
response_format=AnswerModel,
)
return completion.choices[0].message.parse
prompt = ChatPrompt(
name="demo_prompt",
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "{question}"}
],
invoke=invoke_llm
)
# You can use a demo dataset for testing, or your own dataset
dataset = tiny_test()
print(f"Using dataset: {dataset.name}, with {len(dataset.get_items())} items.")
# This example uses Levenshtein distance to measure output quality
def levenshtein_ratio(dataset_item, llm_output):
metric = LevenshteinRatio()
return metric.score(reference=dataset_item['label'], output=llm_output)
print("Starting optimization...")
result = optimizer.optimize_prompt(
prompt=prompt,
dataset=dataset,
metric=levenshtein_ratio,
)
##################################################
When running, I got errors like:
Starting optimization...
╭────────────────────────────────────────────────────────────────────╮
│ ● Running Opik Evaluation - MetaPromptOptimizer │
│ │
│ -> View optimization details ]8;id=555929;http://localhost:5173/api/v1/session/redirect/optimizations/?optimization_id=0197ffef-67ba-7444-8056-dc209e1a9af7&dataset_id=01970e14-9fab-7307-87ad-d4faa00fc7c5&path=aHR0cDovL2xvY2FsaG9zdDo1MTczL2FwaS8=\in your Opik dashboard]8;;\ │
╰────────────────────────────────────────────────────────────────────╯
> Let's optimize the prompt:
╭─ system ───────────────────────────────────────────────────────────╮
│ │
│ You are a helpful assistant. │
│ │
╰────────────────────────────────────────────────────────────────────╯
╭─ user ─────────────────────────────────────────────────────────────╮
│ │
│ {question} │
│ │
╰────────────────────────────────────────────────────────────────────╯
Using MetaPromptOptimizer with the parameters:
- n_samples: None
- auto_continue: False
> First we will establish the baseline performance:
Evaluation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
[2025-07-12 20:39:30] ERROR Error calling model with prompt: 'Completions' object [meta_prompt_optimizer.py](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py):[312](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py#312)
has no attribute 'parse'
ERROR Failed prompt: [{'role': 'system', 'content': 'You are [meta_prompt_optimizer.py](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py):[313](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py#313)
a helpful assistant.'}, {'role': 'user', 'content':
'{question}'}]
ERROR Prompt length: 38 [meta_prompt_optimizer.py](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py):[314](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py#314)
ERROR Error calling model with prompt: 'Completions' object [meta_prompt_optimizer.py](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py):[312](file:///home/zt/azai/lib/python3.11/site-packages/opik_optimizer/meta_prompt_optimizer/meta_prompt_optimizer.py#312)
has no attribute 'parse'
Thank you in advance
Your invoke_llm() function should return a string. Does completion.choices[0].message have a property parse? It doesn't look like it from the error message.
Thank you for the hint "invoke_llm() function should return a string". Now optimization works
Glad that worked!