gptscript icon indicating copy to clipboard operation
gptscript copied to clipboard

Implement first-class approach to testing GPTScripts

Open tylerslaton opened this issue 11 months ago • 1 comments

I recently wrote out a sample of what an integration suite could look like entirely written in GPTScript. While we may not move forward with that specific approach I'd like to propose a first-class way to do this, not just for our examples.

My proposition here is a gptscript test <file>.gpt that calls a built-in tool along the lines of sys.test. Then the process becomes calling sys.test on a structured .test.gpt file where that file includes: how to test the file and what test cases there are.

Open to ideas around how this UX could be improved but I think that the approach of defining a standardized testing approach would be super beneficial.

tylerslaton avatar Feb 29 '24 17:02 tylerslaton

Some of the trade-offs:

  • If we hand off testing to the LLM, sometimes results won't be consistent. It'll be good sometimes and worse others.
  • More structured testing removes some of the benefits that an LLM would provide.
  • I'd like to achieve some sort of middle-ground with this approach.

tylerslaton avatar Feb 29 '24 18:02 tylerslaton

Our smoke tests do this to some extent now. While it's doesn't exactly have "first-class gptscript support", it does use gpt4-o to perform fuzzy equality on the output and tool calls from executing a script.

I'm going to close this out, but I think we definitely can and should reopen if we feel the smoke tests don't meet our needs here.

njhale avatar Jul 09 '24 05:07 njhale