evals icon indicating copy to clipboard operation
evals copied to clipboard

How to eval output with ideal_answer directly without having to define the completion_fn ?

Open liuyaox opened this issue 1 year ago • 1 comments

Describe the feature or improvement you're requesting

I have already had the output (generated from LLM) and ideal_answers in my jsonl file. For a look:

{'input': 'what is 2 plus 1?', 'output': '3', 'ideal': '3'}
{'input': 'what is 2 plus 2?', 'output': '3', 'ideal': '4'}

I don't need to define the completion_fn, because it's used to generate output which I have already had. So, how can I eval output with ideal_answer directly ? Thanks a lot.

Additional context

No response

liuyaox avatar Aug 29 '23 07:08 liuyaox

Hey @liuyaox,

I'm not entirely sure if I've grasped your question accurately, but I'll endeavor to provide the best assistance possible. I am assuming this is intended for your personal use case on a fork of the repo, and not with an aim to contribute to the main repository. The guidelines will be virtually the same in either case, but I will not delve deeply into contribution conventions in this response for the sake of brevity. You can find more detailed information in the documentation.

To run this and obtain an evaluation score based on the models' responses, follow these steps:

  1. Navigate to evals/registry/data and create a new folder; considering your examples, you might name it basic_math.
  2. Inside this folder, place your jsonl file. Although it can be named anything, let's assume you choose sample.jsonl.

NOTE: Please record the folder name and file name as they will be required shortly.

  1. Proceed to evals/registry/evals and create a new yaml file, naming it according to your preference. In this instance, let's use basic_math.yaml.
  2. Populate your yaml file with the necessary configurations. Here is a simplified match template for your reference. Additional details can be found here.
basic_math:
  id: basic_math.dev.v0
  description: Test the model's ability to perform basic math operations.
  metrics: [accuracy]

basic_math.dev.v0:
  class: evals.elsuite.basic.match:Match
  args:
    samples_jsonl: basic_math/samples.jsonl  # Note the format here: <foldername>/<filename>.jsonl
  1. You can now execute an evaluation using the oaieval command from the CLI. Find more details here. Use the following template:
oaieval <model you want to test> <eval name>

In your scenario, it would be:

oaieval gpt-3.5-turbo basic_math

Provided your environment is configured correctly and all files are correctly placed, executing the above command should initiate the evaluation process.

Lastly, please direct future inquiries of this nature to the Discussion Tab, as it is a more appropriate platform for seeking guidance on understanding or running the repo, where as this tab is meant for reporting implementation issues. You're more likely to receive a response to your question there. I hope this assists you, and I am here for any further queries you may have!

douglasmonsky avatar Sep 12 '23 01:09 douglasmonsky