gptscript
gptscript copied to clipboard
Implement first-class approach to testing GPTScripts
I recently wrote out a sample of what an integration suite could look like entirely written in GPTScript. While we may not move forward with that specific approach I'd like to propose a first-class way to do this, not just for our examples.
My proposition here is a gptscript test <file>.gpt
that calls a built-in tool along the lines of sys.test
. Then the process becomes calling sys.test on a structured
Open to ideas around how this UX could be improved but I think that the approach of defining a standardized testing approach would be super beneficial.
Some of the trade-offs:
- If we hand off testing to the LLM, sometimes results won't be consistent. It'll be good sometimes and worse others.
- More structured testing removes some of the benefits that an LLM would provide.
- I'd like to achieve some sort of middle-ground with this approach.
Our smoke tests do this to some extent now. While it's doesn't exactly have "first-class gptscript support", it does use gpt4-o
to perform fuzzy equality on the output and tool calls from executing a script.
I'm going to close this out, but I think we definitely can and should reopen if we feel the smoke tests don't meet our needs here.