evals icon indicating copy to clipboard operation
evals copied to clipboard

Added eval to predict next number in the series (Accuracy: 0.97, Samples: 100)

Open hmzakhalid opened this issue 1 year ago • 1 comments

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. 🚨


In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. We encourage partial PR's with ~5-10 example that we can then run the evals on and share the results with you so you know how your eval does with GPT-4 before writing all 100 examples.

Eval details 📑

Eval name


Eval description

Test the model's ability to predict the next value in a mathematical series.

What makes this a useful eval?

It should be able to analyze the sequence of numbers to predict what comes next, since all numbers in a series have some relation it should be able to predict the next coming number.

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • [x] Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • [x] Include at least 100 high quality examples (it is okay to only contribute 5-10 meaningful examples and have us test them with GPT-4 before adding all 100)

If there is anything else that makes your eval worth including, please document it below.

I used the following series to generate the sample fibonacci, arithmetic_series, geometric_series, harmonic_series,square_series,cube_series, triangular_series

Here's the code i wrote to generate the samples. https://gist.github.com/hmzakhalid/7f62e5805a322640ab605925a3b9a450

Unique eval value

Insert what makes your eval high quality that was not mentioned above. (Not required)

Eval structure 🏗️

Your eval should

  • [x] Check that your data is in evals/registry/data/{name}
  • [x] Check that your yaml is registered at evals/registry/evals/{name}.jsonl
  • [x] Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.

  • [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgement

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.

Submit eval

  • [x] I have filled out all required fields in the evals PR form
  • [x] (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON


{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01"}], "ideal": "0.01"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0 1 1 2 3 5 8 13 21 34"}], "ideal": "55"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 343 512 729 1000 1331 1728 2197 2744 3375 4096"}], "ideal": "4913"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2.33 1.75 1.4 1.17 1.0 0.88 0.78 0.7 0.64 0.58"}], "ideal": "0.54"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 45 55 66 78 91 105 120 136 153 171"}], "ideal": "190"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 66 55 45 36 28 21 15 10 6 3"}], "ideal": "1"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 45 55 66 78 91 105 120 136 153 171"}], "ideal": "190"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 24 22 20 18 16 14 12 10 8 6"}], "ideal": "4"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 71 80 89 98 107 116 125 134 143 152"}], "ideal": "161"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 89 55 34 21 13 8 5 3 2 1"}], "ideal": "1"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 144 121 100 81 64 49 36 25 16 9"}], "ideal": "4"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 49 64 81 100 121 144 169 196 225 256"}], "ideal": "289"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.06 0.06 0.07 0.07 0.08 0.09 0.1 0.11 0.13 0.15"}], "ideal": "0.18"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.21 0.22 0.24 0.25 0.27 0.29 0.31 0.33 0.36 0.4"}], "ideal": "0.44"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 91 78 66 55 45 36 28 21 15 10"}], "ideal": "6"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 81 100 121 144 169 196 225 256 289 324"}], "ideal": "361"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 100 121 144 169 196 225 256 289 324 361"}], "ideal": "400"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1728 1331 1000 729 512 343 216 125 64 27"}], "ideal": "8"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 27 64 125 216 343 512 729 1000 1331 1728"}], "ideal": "2197"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 4913 4096 3375 2744 2197 1728 1331 1000 729 512"}], "ideal": "343"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0 1 1 2 3 5 8 13 21 34"}], "ideal": "55"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 169 144 121 100 81 64 49 36 25 16"}], "ideal": "9"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 49 64 81 100 121 144 169 196 225 256"}], "ideal": "289"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 4 9 16 25 36 49 64 81 100 121"}], "ideal": "144"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.25 0.23 0.21 0.19 0.18 0.17 0.16 0.15 0.14 0.13"}], "ideal": "0.12"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 844424930131968 105553116266496 13194139533312 1649267441664 206158430208 25769803776 3221225472 402653184 50331648 6291456"}], "ideal": "786432"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 5 8 13 21 34 55 89 144 233 377"}], "ideal": "610"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2 2 2 2 2 2 2 2 2 2"}], "ideal": "2"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 171 153 136 120 105 91 78 66 55 45"}], "ideal": "36"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1 1 2 3 5 8 13 21 34 55"}], "ideal": "89"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.07 0.07 0.08 0.09 0.1 0.11 0.13 0.15 0.18 0.22"}], "ideal": "0.3"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 100 121 144 169 196 225 256 289 324 361"}], "ideal": "400"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 54 49 44 39 34 29 24 19 14 9"}], "ideal": "4"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 7 7 7 7 7 7 7 7 7 7"}], "ideal": "7"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.14 0.15 0.17 0.18 0.2 0.22 0.25 0.29 0.33 0.4"}], "ideal": "0.5"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 5 8 13 21 34 55 89 144 233 377"}], "ideal": "610"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1 2 3 5 8 13 21 34 55 89"}], "ideal": "144"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1331 1000 729 512 343 216 125 64 27 8"}], "ideal": "1"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1 1 2 3 5 8 13 21 34 55"}], "ideal": "89"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 21 34 55 89 144 233 377 610 987 1597"}], "ideal": "2584"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2197 1728 1331 1000 729 512 343 216 125 64"}], "ideal": "27"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2744 2197 1728 1331 1000 729 512 343 216 125"}], "ideal": "64"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 21 34 55 89 144 233 377 610 987 1597"}], "ideal": "2584"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.03 0.04 0.04 0.04 0.05 0.06 0.07 0.08 0.1 0.13"}], "ideal": "0.2"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 27 64 125 216 343 512 729 1000 1331 1728"}], "ideal": "2197"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 78 66 55 45 36 28 21 15 10 6"}], "ideal": "3"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 64 81 100 121 144 169 196 225 256 289"}], "ideal": "324"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2304 4608 9216 18432 36864 73728 147456 294912 589824 1179648"}], "ideal": "2359296"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 98 91 84 77 70 63 56 49 42 35"}], "ideal": "28"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 3 5 8 13 21 34 55 89 144 233"}], "ideal": "377"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0"}], "ideal": "0.0"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 91 78 66 55 45 36 28 21 15 10"}], "ideal": "6"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1000 1331 1728 2197 2744 3375 4096 4913 5832 6859"}], "ideal": "8000"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 324 1944 11664 69984 419904 2519424 15116544 90699264 544195584 3265173504"}], "ideal": "19591041024"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 32 38 44 50 56 62 68 74 80 86"}], "ideal": "92"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 4 9 16 25 36 49 64 81 100 121"}], "ideal": "144"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.86 0.43 0.29 0.21 0.17 0.14 0.12 0.11 0.1 0.09"}], "ideal": "0.08"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0 1 1 2 3 5 8 13 21 34"}], "ideal": "55"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 40000000000000000000 4000000000000000000 400000000000000000 40000000000000000 4000000000000000 400000000000000 40000000000000 4000000000000 400000000000 40000000000"}], "ideal": "4000000000"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 3391115364245 484445052035 69206436005 9886633715 1412376245 201768035 28824005 4117715 588245 84035"}], "ideal": "12005"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.22 0.19 0.17 0.15 0.13 0.12 0.11 0.1 0.1 0.09"}], "ideal": "0.08"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 49 64 81 100 121 144 169 196 225 256"}], "ideal": "289"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 91 78 66 55 45 36 28 21 15 10"}], "ideal": "6"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 169 144 121 100 81 64 49 36 25 16"}], "ideal": "9"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.12 0.06 0.04 0.03 0.03 0.02 0.02 0.02 0.01 0.01"}], "ideal": "0.01"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 559872 3359232 20155392 120932352 725594112 4353564672 26121388032 156728328192 940369969152 5642219814912"}], "ideal": "33853318889472"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 4181 2584 1597 987 610 377 233 144 89 55"}], "ideal": "34"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 377 233 144 89 55 34 21 13 8 5"}], "ideal": "3"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 6859 5832 4913 4096 3375 2744 2197 1728 1331 1000"}], "ideal": "729"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 37980492079544 5425784582792 775112083256 110730297608 15818613944 2259801992 322828856 46118408 6588344 941192"}], "ideal": "134456"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1597 987 610 377 233 144 89 55 34 21"}], "ideal": "13"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1 4 9 16 25 36 49 64 81 100"}], "ideal": "121"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.02 0.02 0.03 0.03 0.03 0.03 0.04 0.04 0.05 0.06"}], "ideal": "0.07"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 144 89 55 34 21 13 8 5 3 2"}], "ideal": "1"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 343 512 729 1000 1331 1728 2197 2744 3375 4096"}], "ideal": "4913"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 600000000000000000 60000000000000000 6000000000000000 600000000000000 60000000000000 6000000000000 600000000000 60000000000 6000000000 600000000"}], "ideal": "60000000"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 4096 3375 2744 2197 1728 1331 1000 729 512 343"}], "ideal": "216"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 133514404296875 26702880859375 5340576171875 1068115234375 213623046875 42724609375 8544921875 1708984375 341796875 68359375"}], "ideal": "13671875"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 1 4 9 16 25 36 49 64 81 100"}], "ideal": "121"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 8 13 21 34 55 89 144 233 377 610"}], "ideal": "987"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 8 27 64 125 216 343 512 729 1000 1331"}], "ideal": "1728"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 377 233 144 89 55 34 21 13 8 5"}], "ideal": "3"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 136 120 105 91 78 66 55 45 36 28"}], "ideal": "21"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 49 64 81 100 121 144 169 196 225 256"}], "ideal": "289"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2584 1597 987 610 377 233 144 89 55 34"}], "ideal": "21"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 45 55 66 78 91 105 120 136 153 171"}], "ideal": "190"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 28 31 34 37 40 43 46 49 52 55"}], "ideal": "58"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.19 0.16 0.14 0.13 0.11 0.1 0.1 0.09 0.08 0.08"}], "ideal": "0.07"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.33 0.29 0.25 0.22 0.2 0.18 0.17 0.15 0.14 0.13"}], "ideal": "0.12"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 2 5 8 11 14 17 20 23 26 29"}], "ideal": "32"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 93 87 81 75 69 63 57 51 45 39"}], "ideal": "33"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 8 13 21 34 55 89 144 233 377 610"}], "ideal": "987"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 28 36 45 55 66 78 91 105 120 136"}], "ideal": "153"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.1 0.08 0.07 0.06 0.05 0.04 0.04 0.04 0.03 0.03"}], "ideal": "0.03"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 10 12 14 16 18 20 22 24 26 28"}], "ideal": "30"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 233 144 89 55 34 21 13 8 5 3"}], "ideal": "2"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 22 24 26 28 30 32 34 36 38 40"}], "ideal": "42"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 0.09 0.07 0.06 0.05 0.05 0.04 0.04 0.04 0.03 0.03"}], "ideal": "0.03"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 17 16 15 14 13 12 11 10 9 8"}], "ideal": "7"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What comes next in the series?: 9216 36864 147456 589824 2359296 9437184 37748736 150994944 603979776 2415919104"}], "ideal": "9663676416"}

hmzakhalid avatar Mar 15 '23 09:03 hmzakhalid

Related to:

  • #212
  • #223

Ein-Tim avatar Mar 16 '23 06:03 Ein-Tim

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

usama-openai avatar Jun 01 '23 20:06 usama-openai

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

Hey @usama-openai, just got the email notification. Thank you for considering my contribution. I've made the requested changes.

  • Changed the Data to have unique entries
  • Changed the System prompt to generate only the next number
  • Change the evaluator to Match

Please reopen the issue so it can be reviewed and merged

hmzakhalid avatar Jun 02 '23 09:06 hmzakhalid

You should see GPT-4 API access enabled in your account in the next few days.

usama-openai avatar Jun 02 '23 21:06 usama-openai

I already had the access :D

hmzakhalid avatar Jun 03 '23 04:06 hmzakhalid