instructor Different results with and without Instructor

I am using the same model (llama-3-70b) for sentiment analysis, and I am getting very different results (in terms of sentiments) when I am using instructor library and without it. I am using the together.ai API, all the parameters specified are the same, including the temperature, prompts, max_tokens, etc . Could you help me understand what might be the reason for this?

Jul 29 '24 06:07 baghdasaryanani

Could you help me understand what might be the reason for this?

Without Instructor: Call == Text With Instructor: Call == Text + Output_Schema

More work to be done, more things to take care of, more 'cognitive load'.

I experienced the same but luckily found a solution that I use ever since. https://github.com/jxnl/instructor/discussions/497#discussioncomment-8979998

If the prompt is well designed this approach works really well (for me). If you add k-shots, even better (read https://eugeneyan.com/writing/prompting/ )

Something I use a lot to improve the quality of the output is the 'reasoning'.

This is at the end of the prompt.

# Output
reasoning: str # Reasoning behind your chosen solution
solution: int # Selected choice.

solution could be a bool, int, str, .... whatever you need, just don't ask only for the solution, but make the model elaborate how/why it got there. Some call it the 'sketchboard' / 'sketchpad' / 'scratchpad' (in MedPrompt+ paper)

Jul 31 '24 10:07 Mr-Ruben

From what I used instructor (with md_json mode on gpt3.5), Instructor adds some things to prompt which was suprising, because I only caught it by using Langfuse:

As a genius expert, your task is to understand the content and provide
the parsed objects in json that match the following json_schema:

{json_schema from pydantic model}

Make sure to return an instance of the JSON, not the schema itself.

You can add your own system message, but from what I could find you can't change this 2 added bits.

Also instructor parses the llm output with pydantic models and will retry to fix output if it's incorrect out of the box - which is nice and less error prone (especially on smaller models) :)

Aug 01 '24 08:08 kondera

@baghdasaryanani when you're using the togetherai api for llama without instructor, is it with function calling or without function calling?

Aug 01 '24 09:08 ivanleomk

@kondera I think this is for Gemini right and JSON mode if I'm not wrong. We implemented a while ago to prompt the model to output structured output. Do you have a proposed way to turn this off and on?

Aug 01 '24 09:08 ivanleomk

I'm using OpenAi models on Azure (imported from langfuse). I'm using md_json mode, because function calling doesn't really works for me - the model returns js function and not extracted data (mostly working with 3.5, so not suprising it makes mistakes). I'm not sure how to turn it on or off, but I was just suprised to see this added - I am interested in whether if I enter my own system instruction and then the instructor enters its own, can it confuse the model. I couldn't find in documentation how instructor formats prompt with pydantic model, which would be nice, because there is a lot of fun documentation how to prompt with pydantic, but without possibility to see 'raw prompt' (is there, btw?).

Aug 01 '24 10:08 kondera

@baghdasaryanani when you're using the togetherai api for llama without instructor, is it with function calling or without function calling?

@ivanleomk It is without function calling.

Aug 01 '24 10:08 baghdasaryanani

https://eugeneyan.com/writing/prompting/

Could you help me understand what might be the reason for this?

Without Instructor: Call == Text With Instructor: Call == Text + Output_Schema

More work to be done, more things to take care of, more 'cognitive load'.

I experienced the same but luckily found a solution that I use ever since. #497 (reply in thread)

If the prompt is well designed this approach works really well (for me). If you add k-shots, even better (read https://eugeneyan.com/writing/prompting/ )

Something I use a lot to improve the quality of the output is the 'reasoning'.

This is at the end of the prompt.
# Output
reasoning: str # Reasoning behind your chosen solution
solution: int # Selected choice.    
solution could be a bool, int, str, .... whatever you need, just don't ask only for the solution, but make the model elaborate how/why it got there. Some call it the 'sketchboard' / 'sketchpad' / 'scratchpad' (in MedPrompt+ paper)

Do you happen to have a fully fledged example (on GitHub?) for your best practices? Thanks @Mr-Ruben

Sep 09 '24 10:09 ChristianWeyer

Closing as part of repository maintenance for issues created before 2025.

Mar 05 '25 00:03 jxnl