llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

feat: Add experimental integration tests with cached providers

Open derekhiggins opened this issue 5 months ago • 1 comments

What does this PR do?

Introduce a new workflow to run integration and agent tests against the OpenAI provider, and a subset of inference tests against Fireworks AI.

Responses from inference providers are cached and reused in subsequent jobs, reducing API quota usage, speeding up test runs, and improving reliability by avoiding server-side errors. (see https://github.com/derekhiggins/cachemeifyoucan for caching code)

The cache is updated on successful job completion.

To prevent the cache from diverging or growing indefinitely, periodically refresh or manually update the cache by running this job without a cache(will do as a follow on if this merges).

Any updates to integration tests that change the request to the provider will use provider quota and update the cache (if they fail), so need to be careful not to repeatedly run failing PR's

Related: #2017

derekhiggins avatar May 15 '25 09:05 derekhiggins