evals icon indicating copy to clipboard operation
evals copied to clipboard

Poker Hands Analysis Eval (19.8% accuracy)

Open douglasmonsky opened this issue 1 year ago • 3 comments

Thank you for contributing an eval! ♥️

🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. 🚨

PLEASE READ THIS:

In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject since GPT-4 is already capable of completing the task.

We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. Starting April 10, the minimum eval count is 15 samples, we hope this makes it easier to create and contribute evals.

Eval details 📑

Eval name

poker_analysis

Eval description

Created 10,000 example Texas Hold'em hands using a custom Python script that simulates and evaluates poker hands to calculate each player's winning and tie probabilities. The script generates hands with varying numbers of players (ranging from 2 to 9) and community cards (3, 4, or 5 cards). The resulting data is formatted as JSON Lines, which are carefully structured to create an effective prompt for GPT-based evaluation.

Each example hand consists of an "input" key containing a list of two dictionaries:

1.The first dictionary has a "role" key with the value "system" and a "content" key providing the task description. In this context, the task is to identify the player with the highest winning probability in a given Texas Hold'em hand. 2.The second dictionary has a "role" key with the value "user" and a "content" key presenting the hand details, which include each player's hole cards and the community cards. For example: • Player 1 Hole: (2h, 4s) • Player 2 Hole: (Qh, Jc) • Community cards: 5d, 9s, Kc The "ideal" key provides the index (1-based) of the player with the highest probability of winning the hand. In this example, the value is "2", indicating that Player 2 has the greatest chance of winning.

What makes this a useful eval?

Based on Texas Hold'em poker hands, which allows for a consistent theme throughout the examples. This enables the model to demonstrate its ability to reason about probabilities and game strategies in a specific context.

Can reveal instances where the model fails to reason about probabilities or game strategies correctly, despite a human being able to perform the task.

Features extensive (10,000) high-quality examples, and the ability to generate millions more algorithmically, with scalable difficulty (More on that below).

Criteria for a good eval ✅

Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).

Your eval should be:

  • [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
  • [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
  • [x] Includes good signal around what is the right behavior. This means either a correct answer for Basic evals or the Fact Model-graded eval, or an exhaustive rubric for evaluating answers for the Criteria Model-graded eval.
  • [x] Include at least 15 high quality examples.

If there is anything else that makes your eval worth including, please document it below.

Unique eval value

Open-source example generator: The script used for generating these examples has been open-sourced and is now available for evaluation and use at the following URL: https://github.com/douglasmonsky/GPT_poker_eval_sample_generator/blob/main/poker_tools.py.

Near-infinite scalability: The custom Python script allows for the generation of near-infinite examples, ensuring that the evaluation can be extended and adapted as needed. This scalability makes the evaluation a valuable resource for continuous model improvement and performance assessment.

Verifiability of correct answers: The script not only generates the examples but also calculates the correct answers based on poker hand evaluation and probabilities. This ensures that the correct answers can be easily verified, providing a reliable evaluation dataset.

Extendable/ Logic difficulty scalable: The custom Python script can be extended to evaluate the model on deeper levels of critical thinking if there is interest from the OpenAI team. This adaptability enables the assessment of the model's ability to reason more complex scenarios and handle advanced queries. Here are a few examples:

1.Probabilities of each player winning the hand: The script can be modified to ask the model to compute the winning probabilities for each player, which would test its ability to perform advanced calculations and reason about the game dynamics. 2.Probabilities with removed cards: The script can take into account specific cards removed from the deck, such as [As, Kc, Td], and assess the model's ability to adapt its reasoning to account for this new information when calculating winning probabilities. 3.Best possible hole cards based on the community board: The evaluation can include questions that prompt the model to identify the best possible hole cards given the current community board. This would test the model's understanding of poker hand rankings and its ability to reason about optimal play. 4.Impact of removing a suit from the deck: The script can be adapted to evaluate the model's ability to consider the effects of removing all cards of a specific suit from the deck and how this would change the winning probabilities for each player. This would challenge the model's adaptability and its capability to reason about the game under altered conditions. By extending the script to incorporate these additional levels of critical thinking, the evaluation can offer a more comprehensive assessment of the model's performance and reasoning capabilities in the context of Texas Hold'em poker hands.

Eval structure 🏗️

Your eval should

  • [x] Check that your data is in evals/registry/data/{name}
  • [x] Check that your yaml is registered at evals/registry/evals/{name}.yaml
  • [x] Ensure you have the right to use the data you submit via this eval

(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

  • [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.

  • [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.

Limited availability acknowledgement

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

  • [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.

Submit eval

  • [x] I have filled out all required fields in the evals PR form
  • [x] (Ignore if not submitting code) I have run pip install pre-commit; pre-commit install and have verified that black, isort, and autoflake are running when I commit and push

Failure to fill out all required fields will result in the PR being closed.

Eval JSON data

Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:

View evals in JSON

Eval

{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4h, 7h), Player 2 Hole: (4d, Js), Player 3 Hole: (3d, 6h), Player 4 Hole: (Kd, 8d), Player 5 Hole: (7c, 9d), Community: Th, 8c, 2s"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (6h, 9c), Player 2 Hole: (Ts, 4s), Community: Tc, 9d, 7s, 4d"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (3h, 4d), Player 2 Hole: (2c, Ac), Player 3 Hole: (Kd, Td), Player 4 Hole: (Jc, 9h), Player 5 Hole: (Tc, 6d), Player 6 Hole: (5d, 5c), Community: Th, Kc, As, 2h, Qh"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (6h, Qh), Player 2 Hole: (3d, 9s), Player 3 Hole: (Ah, 5s), Player 4 Hole: (Qd, Kh), Player 5 Hole: (5c, Jd), Player 6 Hole: (Ks, Jc), Player 7 Hole: (8h, Ac), Community: Qs, 6s, Tc, 9d, 5h"}], "ideal": "6"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (3h, Th), Player 2 Hole: (Kc, 6d), Player 3 Hole: (2h, Ts), Player 4 Hole: (Ad, 9c), Player 5 Hole: (Tc, 2c), Player 6 Hole: (3s, 5s), Player 7 Hole: (Qd, Qh), Player 8 Hole: (5c, 8c), Player 9 Hole: (8d, Ks), Community: Ac, Jd, 8h"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (6c, 3s), Player 2 Hole: (4d, Td), Player 3 Hole: (Ts, 3d), Player 4 Hole: (5h, 7c), Player 5 Hole: (7d, Qc), Community: Qh, As, 6h, Tc"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4s, 7c), Player 2 Hole: (9h, 9d), Player 3 Hole: (3c, 4d), Player 4 Hole: (4h, Ac), Player 5 Hole: (As, 5c), Player 6 Hole: (9c, 8s), Player 7 Hole: (7d, Ad), Player 8 Hole: (Qc, 2c), Community: 7s, Tc, Ts, 6h"}], "ideal": "6"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Kh, 8h), Player 2 Hole: (2c, Ah), Player 3 Hole: (5s, Kd), Player 4 Hole: (2s, Ks), Player 5 Hole: (Jd, Qs), Player 6 Hole: (Tc, 9c), Player 7 Hole: (8s, 7s), Community: 7c, Kc, 7d"}], "ideal": "7"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Qd, 5d), Player 2 Hole: (Th, Tc), Player 3 Hole: (3s, Js), Player 4 Hole: (5c, Td), Player 5 Hole: (8c, 7h), Player 6 Hole: (3c, Jc), Player 7 Hole: (Ac, 4d), Player 8 Hole: (Qc, Ah), Player 9 Hole: (9h, Ks), Community: 8h, 6h, Ad"}], "ideal": "8"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (6s, 2s), Player 2 Hole: (Tc, Jc), Player 3 Hole: (Td, Ah), Player 4 Hole: (9h, 4d), Community: 6h, 5d, Th, 7s"}], "ideal": "3"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4s, Kc), Player 2 Hole: (3c, 5s), Player 3 Hole: (2h, 3s), Player 4 Hole: (5d, 7d), Player 5 Hole: (Tc, Kd), Player 6 Hole: (7h, 8h), Player 7 Hole: (Ts, Kh), Community: 4h, 7s, Qh, 7c, Jd"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (As, Qc), Player 2 Hole: (9h, Kh), Player 3 Hole: (3h, 5d), Player 4 Hole: (2d, Qs), Player 5 Hole: (Kc, 7s), Player 6 Hole: (Ks, 9s), Community: 7d, 3d, 6d, Kd"}], "ideal": "3"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4d, Qh), Player 2 Hole: (4c, 9s), Player 3 Hole: (7s, 8d), Player 4 Hole: (6c, 7c), Player 5 Hole: (8c, 2c), Player 6 Hole: (4h, Ad), Community: Kc, 5s, Ac, 8h, 9d"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (3c, 8h), Player 2 Hole: (Tc, 4s), Community: As, 5c, Ad, 7d, 9c"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4h, 4d), Player 2 Hole: (7h, Jc), Player 3 Hole: (6s, 7s), Player 4 Hole: (8c, 5d), Player 5 Hole: (8d, Ah), Player 6 Hole: (Kc, 6h), Community: 8s, 6d, Ad"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (9c, Qd), Player 2 Hole: (2s, 7s), Player 3 Hole: (2c, 8d), Player 4 Hole: (4h, 3s), Player 5 Hole: (Ks, 8c), Community: 5h, Td, 9d, 7c, 4c"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (8d, Td), Player 2 Hole: (Jh, Kh), Player 3 Hole: (2c, 3h), Player 4 Hole: (7c, Th), Player 5 Hole: (5c, As), Player 6 Hole: (7h, 8c), Community: Ts, 5s, Ad"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (6d, Ac), Player 2 Hole: (9h, Ts), Player 3 Hole: (4h, 5h), Player 4 Hole: (3c, 2d), Player 5 Hole: (Jh, Th), Player 6 Hole: (3h, 5s), Player 7 Hole: (9s, 2c), Community: 5c, Qd, 8c"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (2d, Ac), Player 2 Hole: (Qc, 7d), Player 3 Hole: (7s, Th), Player 4 Hole: (Kh, 2c), Player 5 Hole: (5h, 8d), Player 6 Hole: (6s, 9h), Player 7 Hole: (8s, Ad), Player 8 Hole: (7h, 6c), Player 9 Hole: (Qd, 8c), Community: Qs, 5c, 3h, Ts"}], "ideal": "9"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (7s, Ac), Player 2 Hole: (4h, 7c), Community: Qs, 4d, Jc"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Th, 2s), Player 2 Hole: (7h, Jh), Player 3 Hole: (Js, Ah), Player 4 Hole: (8s, 5d), Community: Qh, 8c, Ts, 5c, Jc"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Jd, 7h), Player 2 Hole: (6c, Tc), Player 3 Hole: (4s, 7s), Player 4 Hole: (Qc, 4c), Player 5 Hole: (3s, As), Player 6 Hole: (Kd, 3c), Community: 2h, 8d, 4h"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Tc, Qc), Player 2 Hole: (2h, 2s), Player 3 Hole: (Kd, Ac), Player 4 Hole: (Qh, 3d), Player 5 Hole: (9s, 9h), Player 6 Hole: (7d, 2c), Community: 5d, Ts, Kh, 9c, Th"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (3h, 8h), Player 2 Hole: (Ah, Tc), Player 3 Hole: (Qd, 7s), Player 4 Hole: (Js, 2h), Player 5 Hole: (7c, 3s), Player 6 Hole: (Ac, 9c), Player 7 Hole: (Jc, Ad), Player 8 Hole: (6c, Kc), Community: 5c, Kd, 5h"}], "ideal": "8"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Qc, 2d), Player 2 Hole: (Qh, 4c), Player 3 Hole: (Kc, 9s), Player 4 Hole: (Ks, Ah), Player 5 Hole: (8c, Tc), Player 6 Hole: (Kd, 7c), Player 7 Hole: (Td, 9d), Player 8 Hole: (3d, 2c), Community: 4s, 7h, Jh"}], "ideal": "6"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Jd, 5h), Player 2 Hole: (Ad, 8h), Player 3 Hole: (7h, Kc), Player 4 Hole: (5s, 7c), Player 5 Hole: (5d, 4d), Player 6 Hole: (2c, 3d), Player 7 Hole: (6s, 7d), Player 8 Hole: (Qs, Th), Community: Js, 7s, 2h, Jh"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Kh, 4s), Player 2 Hole: (9d, 2s), Player 3 Hole: (4h, 5s), Community: 8h, Js, 8s, Jd"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4s, 7c), Player 2 Hole: (8c, Ts), Community: 8s, 5h, 5s, Ah, Qs"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (2d, 8d), Player 2 Hole: (Ac, 9c), Player 3 Hole: (Tc, Jh), Player 4 Hole: (Js, Ts), Player 5 Hole: (7c, 7h), Player 6 Hole: (6h, 8s), Community: Th, Kc, Qs, 2c, Ah"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (8h, 2d), Player 2 Hole: (2c, 3d), Player 3 Hole: (4c, Ah), Player 4 Hole: (Tc, 4s), Player 5 Hole: (Jc, 4d), Player 6 Hole: (8d, Jh), Player 7 Hole: (Ts, 8s), Player 8 Hole: (9h, 7s), Community: 6d, 5s, 6s, Qh"}], "ideal": "3"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (9d, 5h), Player 2 Hole: (4s, 4d), Player 3 Hole: (7s, 3s), Player 4 Hole: (3c, Qh), Player 5 Hole: (Ks, Qc), Player 6 Hole: (6s, Qs), Player 7 Hole: (8s, 8c), Player 8 Hole: (8d, 9s), Community: As, 6h, 2h"}], "ideal": "7"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Jh, Ts), Player 2 Hole: (6c, Qh), Player 3 Hole: (Qd, As), Player 4 Hole: (Tc, 2h), Player 5 Hole: (Js, Ac), Player 6 Hole: (Jd, 8c), Player 7 Hole: (2d, 5s), Player 8 Hole: (5d, 9s), Community: Jc, 3d, 4c"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (3h, Ks), Player 2 Hole: (9h, 8c), Player 3 Hole: (5s, Jh), Player 4 Hole: (8s, 3c), Player 5 Hole: (6s, Qs), Player 6 Hole: (Kc, Qc), Player 7 Hole: (4d, 3d), Community: 5d, Kd, 6h, 6d"}], "ideal": "7"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (5c, Jh), Player 2 Hole: (Td, As), Player 3 Hole: (Qs, Qd), Player 4 Hole: (Qc, 2d), Community: Qh, 7d, 6s"}], "ideal": "3"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Tc, 4c), Player 2 Hole: (6s, 5s), Player 3 Hole: (Jh, 2d), Player 4 Hole: (9d, Ah), Player 5 Hole: (5h, 8s), Player 6 Hole: (3c, 4s), Player 7 Hole: (Td, Qd), Player 8 Hole: (Ks, 2c), Player 9 Hole: (8h, 2h), Community: 9c, Js, 5c, Jd"}], "ideal": "3"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Js, Kh), Player 2 Hole: (As, 8s), Player 3 Hole: (Ac, 5h), Player 4 Hole: (8c, 9s), Player 5 Hole: (Qc, Jd), Player 6 Hole: (Ks, 6d), Player 7 Hole: (Td, 9d), Player 8 Hole: (6h, Ad), Community: 4c, 9h, 2h"}], "ideal": "7"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (4c, Kc), Player 2 Hole: (Th, 9d), Player 3 Hole: (5d, 8d), Player 4 Hole: (6s, 8s), Community: 3c, 4s, 9h, Qs, Td"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Qh, 3h), Player 2 Hole: (8d, 5c), Player 3 Hole: (Th, 9c), Community: 9h, Kd, 2s, Qd, Kh"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (7s, Th), Player 2 Hole: (4s, Jd), Player 3 Hole: (Tc, Jc), Player 4 Hole: (Ah, 6s), Community: 4h, 8s, 2d, 4c, Td"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Ah, 7h), Player 2 Hole: (Kc, 9d), Player 3 Hole: (7d, 4s), Player 4 Hole: (Td, 7c), Player 5 Hole: (3c, Kd), Player 6 Hole: (9h, Ad), Player 7 Hole: (4h, Qd), Player 8 Hole: (Ac, 5c), Player 9 Hole: (8h, 5d), Community: Jd, Qc, 7s, Jc"}], "ideal": "7"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (5h, Qc), Player 2 Hole: (6d, 3s), Player 3 Hole: (Ad, Ks), Player 4 Hole: (8c, 7c), Player 5 Hole: (Ac, 7h), Player 6 Hole: (5c, 4c), Community: 6h, 6s, 6c, 4s"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (2c, 5c), Player 2 Hole: (8d, 4d), Player 3 Hole: (Jc, Ah), Player 4 Hole: (7s, 6h), Player 5 Hole: (Jh, Ks), Community: Qh, 4h, Kc"}], "ideal": "5"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Ah, Kc), Player 2 Hole: (4s, 2h), Player 3 Hole: (Jc, 6c), Player 4 Hole: (Kh, 5c), Player 5 Hole: (4c, 3s), Player 6 Hole: (Ac, As), Player 7 Hole: (9c, Qd), Community: 6h, 3c, Ad, 4d"}], "ideal": "6"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (7c, 7s), Player 2 Hole: (Ad, Js), Community: 6h, 9s, Qs"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (Ts, 7h), Player 2 Hole: (2c, Qd), Player 3 Hole: (9h, 2h), Player 4 Hole: (9s, Ks), Player 5 Hole: (3c, 5s), Player 6 Hole: (Td, Jh), Community: 4c, Ah, As"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (5c, Qh), Player 2 Hole: (8d, 4s), Player 3 Hole: (Kd, 5d), Player 4 Hole: (Qs, 9c), Player 5 Hole: (6d, Kc), Player 6 Hole: (3h, 8s), Player 7 Hole: (Th, Qc), Community: 9s, Js, Qd, 5h, 4d"}], "ideal": "4"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (7s, Td), Player 2 Hole: (7c, Qc), Player 3 Hole: (Th, Ac), Player 4 Hole: (8c, 6d), Player 5 Hole: (2s, Qs), Player 6 Hole: (Ts, 5d), Community: 5s, Kh, 2c, 5c"}], "ideal": "6"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (As, 7c), Player 2 Hole: (7s, 6h), Community: Kh, 6d, 9c, 6c"}], "ideal": "2"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (8d, Js), Player 2 Hole: (6h, 7s), Player 3 Hole: (Kh, 2s), Player 4 Hole: (Ks, 2h), Player 5 Hole: (7h, 3s), Community: Kc, Jh, Tc"}], "ideal": "1"}
{"input": [{"role": "system", "content": "TASK: You will prompted with a texas hold'em hand. Which player has the highest probability of winning the hand? Answer with exactly one number 1-9 and no additional information or context."}, {"role": "user", "content": "Player 1 Hole: (5d, 6s), Player 2 Hole: (Th, 7c), Player 3 Hole: (8s, Qc), Community: 8h, 3s, Ts, 2c"}], "ideal": "2"}

douglasmonsky avatar Apr 20 '23 14:04 douglasmonsky

whoops not done yet.

Edit: I've completed the necessary changes as discussed. Please let me know if you require any further adjustments. I have two other pending pull requests ( #752 and #760 ) that require similar modifications. I'll proceed with those corrections once I have your confirmation that the current changes meet your requirements/standards.

douglasmonsky avatar May 27 '23 11:05 douglasmonsky

Thanks for implementing the requested changes. Kindly revert changes in the evals/cli/oaievalset.py file. If you want to push any changes other than the eval submission, do it in a separate PR.

Thank you for your feedback @usama-openai. I have reverted the changes in the evals/cli/oaievalset.py file as requested. I would appreciate if you could review this and let me know if the method I have used for reverting the file is acceptable in this context, or if you prefer that I revert the entire commit and resubmit as though the file had never been modified. I appreciate your time and guidance.

douglasmonsky avatar Jun 03 '23 06:06 douglasmonsky

Thank you for your feedback @usama-openai. I have reverted the changes in the evals/cli/oaievalset.py file as requested. I would appreciate if you could review this and let me know if the method I have used for reverting the file is acceptable in this context, or if you prefer that I revert the entire commit and resubmit as though the file had never been modified. I appreciate your time and guidance.

Yes, this method is accpetable.

usama-openai avatar Jun 03 '23 17:06 usama-openai

You should see GPT-4 API access enabled in your account in the next few days.

usama-openai avatar Jun 05 '23 17:06 usama-openai