Single string output adapter occasionally comes back empty or generic
Looks like due to the nature of the way we're using function calling to get structured outputs back, some models occasionally return empty strings for the parameters or just return the description/prompt they were given.
A couple initial thoughts:
- Could use a couple examples of generators that expose the issue
- What are some automated tests that could help us catch things like this in the future?
- Curious to see if this issue happens in output adapters that are more complex, for example: #12 #13 #14
- The python library Instructor uses a similar technique to us by inverting the way function calling is done (how do they solve it? how are their prompts different from ours?)
- It seems like there are probably a few different ways to solve this, really curious to play with them and see what they feel like...
few more notes on this for posterity: the tutorial demo of historical event finder has the following issues with the models below:
gpt 4 50/50 gpt 4o 👎
when renaming the function name in the api calls to "formatter" or "format_response" 4o behaves very well. BUT blueprints begins to behave badly.
when renaming the function to something generic like "response" or "function". Blueprints continues to work well. but historical event finder gets even worse.
local: llama3:8b does not play well with event finder either (this was done through xml) llama3.1:8b DOES play well with event finder.
^both work with blueprints
Hmm if you have the llama models set up, want to see how it plays with the list_of_strings output adapter? I'm curious if the single string output adapter might just be too simple, but if we have a more complex data type it performs better...
sorry took a bit to rebase it had some conflicts
for historical event finder / llama3:8b: {"error"=>"llama3 does not support tools"} bah humbug
for historical event finder / llama3.1:8b: ["First Landing by Vikings in North America", "Independence Day of Chile", "Death of Joseph Stalin"] (just change historical event finder to 3 list long)
I could revert back to the xml approach to test this out if we would like to see how this would effect it!
nah it seems like everyone is converging on this json spec version of tool calling so I think its fine, was more curious to see if that theory of more complex data types could be a fruitful path...
When we have the universal json spec formatter, we could change single string into something like the parameter itself and then an additional ignored parameter like "explanation" or "notes" or something...which may also end up increasing the quality of the output anyway..