evals
evals copied to clipboard
How to eval output with ideal_answer directly without having to define the completion_fn ?
Describe the feature or improvement you're requesting
I have already had the output (generated from LLM) and ideal_answers in my jsonl file. For a look:
{'input': 'what is 2 plus 1?', 'output': '3', 'ideal': '3'}
{'input': 'what is 2 plus 2?', 'output': '3', 'ideal': '4'}
I don't need to define the completion_fn, because it's used to generate output which I have already had. So, how can I eval output with ideal_answer directly ? Thanks a lot.
Additional context
No response
Hey @liuyaox,
I'm not entirely sure if I've grasped your question accurately, but I'll endeavor to provide the best assistance possible. I am assuming this is intended for your personal use case on a fork of the repo, and not with an aim to contribute to the main repository. The guidelines will be virtually the same in either case, but I will not delve deeply into contribution conventions in this response for the sake of brevity. You can find more detailed information in the documentation.
To run this and obtain an evaluation score based on the models' responses, follow these steps:
- Navigate to
evals/registry/data
and create a new folder; considering your examples, you might name itbasic_math
. - Inside this folder, place your jsonl file. Although it can be named anything, let's assume you choose
sample.jsonl
.
NOTE: Please record the folder name and file name as they will be required shortly.
- Proceed to
evals/registry/evals
and create a new yaml file, naming it according to your preference. In this instance, let's usebasic_math.yaml
. - Populate your yaml file with the necessary configurations. Here is a simplified match template for your reference. Additional details can be found here.
basic_math:
id: basic_math.dev.v0
description: Test the model's ability to perform basic math operations.
metrics: [accuracy]
basic_math.dev.v0:
class: evals.elsuite.basic.match:Match
args:
samples_jsonl: basic_math/samples.jsonl # Note the format here: <foldername>/<filename>.jsonl
- You can now execute an evaluation using the
oaieval
command from the CLI. Find more details here. Use the following template:
oaieval <model you want to test> <eval name>
In your scenario, it would be:
oaieval gpt-3.5-turbo basic_math
Provided your environment is configured correctly and all files are correctly placed, executing the above command should initiate the evaluation process.
Lastly, please direct future inquiries of this nature to the Discussion Tab, as it is a more appropriate platform for seeking guidance on understanding or running the repo, where as this tab is meant for reporting implementation issues. You're more likely to receive a response to your question there. I hope this assists you, and I am here for any further queries you may have!