outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Improving testing for steerable models

Open RobinPicard opened this issue 4 months ago • 2 comments

We already have tests for steerable models, but they do no cover all possible situations especially for Transformers that supports a wide range of models. We should not only test initialization with various models, but also run all inference tests (maybe also with all backends). Additionally, we should consider running multiple times the same types of inference tests to improve the probability of catching flaky tests because of the stochastic nature of LLM text generation.

RobinPicard avatar Aug 08 '25 15:08 RobinPicard

Thanks for raising this. I can help expand the steerable model tests to include multiple model initializations, inference runs across different backends, and repeated runs to detect flaky generation.

Would you prefer this as an extension to the existing test module, or should I create a separate parametrized test suite to keep it isolated?

kudos07 avatar Oct 28 '25 06:10 kudos07

Hi @kudos07! Thanks for proposing to work on it. Another element I have not mentioned in the issue but that is interesting would be to cache the HF models to prevent rate limit errors (see this commit in pydantic-ai).

I'm not sure what I prefer yet, but I think relying on parametrization would definitely make sense. Feel free to open a draft PR with improvements on a single aspect of the issue so we can discuss about the testing organization without having to spend time doing everything upfront.

RobinPicard avatar Oct 28 '25 08:10 RobinPicard