openai-cookbook
openai-cookbook copied to clipboard
Evaluating OpenAI Agents
Summary
This PR adds a guide on evaluating OpenAI agents using Langfuse.
Motivation
This cookbook guides users through the typical evaluation process involved in developing AI agents using the open-source tool Langfuse.
It shows how to perform offline evaluation by looping over a dataset and iterating agent metrics (e.g. model, search tool, etc.) It also explains how to do online evaluation, i.e., assessing metrics like costs and latency in a live production environment.
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
- [X] I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
- [X] I have conducted a self-review of my content based on the contribution guidelines:
- [X] Relevance: This content is related to building with OpenAI technologies and is useful to others.
- [X] Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- [X] Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- [X] Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- [X] Correctness: The information I include is correct and all of my code executes successfully.
- [X] Completeness: I have explained everything fully, including all necessary references and citations.
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.
Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)
Hi @lspacagna-oai what do you think about this addition? Let us know if you have any feedback :)
Thanks for contributing, apologies for the delay in reviewing!