rig icon indicating copy to clipboard operation
rig copied to clipboard

test: Add completion models unit & matrix integration tests

Open faassen opened this issue 4 months ago • 5 comments

Given @0xMochan's comment on #66 I took the liberty of creating an issue for it. Here are some of my thoughts. I hope it's helpful at all.

Integration testing of external services is a tricky issue. Even without the stochastic nature of LLMs, there is no guarantee that replaying a request against the same external service is going to give the same response, as state could have changed downstream.

A generic mock service is very useful, but will only test part of the integration, but we also know specific integrations can break and it's difficult to test them thoroughly by manually.

How would manual testing work?

There would be a bunch of scenarios, and you'd run them, eyeballing whether the correct behavior happens. It'd be nice if those scenarios could be generic and specific clients (openai, anthropic, etc) can be injected. This way most of the manual testing code can be reused.

This can also lead to at least some "smoke tests" - while we don't know the details of the responses, we know the response should be successful, and that the right type of response is returned (such as the request to use a tool).

How to automate such smoke tests?

One a test passes with the real API, the requests (including parameters, body, etc) as well as the full responses can be recorded and serialized into files. So, the real clients need to grow an optional recording feature. Then you could have fake versions of the various clients (openapi, anthropic), etc, which instead of making a real quest look whether the request matches a recording, and synthesize the response from this recording as well.

Now that I've written all that I actually thought "wait, this could all happen on the reqwest layer", and I found a library that does something like I just described:

https://docs.rs/reqwest_mock/latest

But I found out it's discontinued. But the README describes a bunch of alternatives to explore:

  • https://github.com/chorusone/rvcr
  • https://github.com/lukemathwalker/wiremock-rs
  • https://github.com/beltram/stubr
  • https://github.com/alexliesenfeld/httpmock
  • https://github.com/lipanski/mockito

I'll read a bit about these next.

faassen avatar Oct 21 '24 19:10 faassen