logfire
logfire copied to clipboard
LLM qualitative evaluations and labeling
Description
It would be nice to have a place in the platform for this. Another option would be to allow for integration with a partner that does provide it.
Yup, we're going on this very thing, see https://github.com/pydantic/pydantic-ai/issues/915 and linked pull request.
That's awesome to see!
One thing that could be an interesting feature to have, particularly for online performance, is to enable the ability for another model to be set up as evaluator versus using a human.