Different results with and without Instructor
I am using the same model (llama-3-70b) for sentiment analysis, and I am getting very different results (in terms of sentiments) when I am using instructor library and without it. I am using the together.ai API, all the parameters specified are the same, including the temperature, prompts, max_tokens, etc . Could you help me understand what might be the reason for this?
Could you help me understand what might be the reason for this?
Without Instructor: Call == Text With Instructor: Call == Text + Output_Schema
More work to be done, more things to take care of, more 'cognitive load'.
I experienced the same but luckily found a solution that I use ever since. https://github.com/jxnl/instructor/discussions/497#discussioncomment-8979998
If the prompt is well designed this approach works really well (for me). If you add k-shots, even better (read https://eugeneyan.com/writing/prompting/ )
Something I use a lot to improve the quality of the output is the 'reasoning'.
This is at the end of the prompt.
# Output
reasoning: str # Reasoning behind your chosen solution
solution: int # Selected choice.
solution could be a bool, int, str, .... whatever you need, just don't ask only for the solution, but make the model elaborate how/why it got there. Some call it the 'sketchboard' / 'sketchpad' / 'scratchpad' (in MedPrompt+ paper)
From what I used instructor (with md_json mode on gpt3.5), Instructor adds some things to prompt which was suprising, because I only caught it by using Langfuse:
As a genius expert, your task is to understand the content and provide
the parsed objects in json that match the following json_schema:
{json_schema from pydantic model}
Make sure to return an instance of the JSON, not the schema itself.
You can add your own system message, but from what I could find you can't change this 2 added bits.
Also instructor parses the llm output with pydantic models and will retry to fix output if it's incorrect out of the box - which is nice and less error prone (especially on smaller models) :)
@baghdasaryanani when you're using the togetherai api for llama without instructor, is it with function calling or without function calling?
@kondera I think this is for Gemini right and JSON mode if I'm not wrong. We implemented a while ago to prompt the model to output structured output. Do you have a proposed way to turn this off and on?
I'm using OpenAi models on Azure (imported from langfuse). I'm using md_json mode, because function calling doesn't really works for me - the model returns js function and not extracted data (mostly working with 3.5, so not suprising it makes mistakes). I'm not sure how to turn it on or off, but I was just suprised to see this added - I am interested in whether if I enter my own system instruction and then the instructor enters its own, can it confuse the model. I couldn't find in documentation how instructor formats prompt with pydantic model, which would be nice, because there is a lot of fun documentation how to prompt with pydantic, but without possibility to see 'raw prompt' (is there, btw?).
@baghdasaryanani when you're using the togetherai api for llama without instructor, is it with function calling or without function calling?
@ivanleomk It is without function calling.
https://eugeneyan.com/writing/prompting/
Could you help me understand what might be the reason for this?
Without Instructor: Call == Text With Instructor: Call == Text + Output_Schema
More work to be done, more things to take care of, more 'cognitive load'.
I experienced the same but luckily found a solution that I use ever since. #497 (reply in thread)
If the prompt is well designed this approach works really well (for me). If you add k-shots, even better (read https://eugeneyan.com/writing/prompting/ )
Something I use a lot to improve the quality of the output is the 'reasoning'.
This is at the end of the prompt.
# Output reasoning: str # Reasoning behind your chosen solution solution: int # Selected choice.
solutioncould be a bool, int, str, .... whatever you need, just don't ask only for the solution, but make the model elaborate how/why it got there. Some call it the 'sketchboard' / 'sketchpad' / 'scratchpad' (in MedPrompt+ paper)
Do you happen to have a fully fledged example (on GitHub?) for your best practices? Thanks @Mr-Ruben
Closing as part of repository maintenance for issues created before 2025.