evals
evals copied to clipboard
Add last-word-nth test
Eval details 📑
Eval name
last-word-nth
Eval description
Test the model's ability to tell what the last word of a sentence is, but by asking it indirectly based on its index.
For instance: What is the 7th word in "I saw a cat in the garden"? Expected:garden
(edit: note that the actual jsonl file includes additional instructions to prevent the models from adding additional explanations or quote marks to its output)
What makes this a useful eval?
GPT models are known for having a hard time with those sort of indexing tasks, so it makes sense to evaluate progress on this front.
Criteria for a good eval ✅
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- [x] Includes good signal around what is the right behavior. This means either a correct answer for
Basicevals or theFactModel-graded eval, or an exhaustive rubric for evaluating answers for theCriteriaModel-graded eval. - [x] Include at least 100 high quality examples
If there is anything else that makes your eval worth including, please document it below.
Unique eval value
Insert what makes your eval high quality that was not mentioned above. (Not required)
Eval structure 🏗️
Your eval should
- [x] Check that your data is in
evals/registry/data/{name} - [x] Check that your yaml is registered at
evals/registry/evals/{name}.jsonl - [x] Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
Final checklist 👀
Submission agreement
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
Email address validation
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.
- [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
Limited availability acknowledgement
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.
Submit eval
- [x] I have filled out all required fields in the evals PR form
- [x] (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit installand have verified thatblack,isort, andautoflakeare running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Eval JSON data
Since we are using Git LFS, we are asking eval submitters to add in their first 100 JSONL eval lines.
View evals in JSON
Eval
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 4th word in \"I am a programmer\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "programmer"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 11th word in \"Today is a beautiful day becasue I will go to paris\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "paris"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 4th word in \"What is your name\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "name"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "pip install pre-commit; pre-commit install Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "garden"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"It was a sunny day\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "day"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"The sky was blue and the birds were singing\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "singing"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The wind was blowing softly\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "softly"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I heard a car passing by\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "by"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I walked down the street\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "street"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I saw a small dog\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "dog"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"He was playing with a ball\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "ball"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I stopped to pet him\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "him"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 4th word in \"He was so cute\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "cute"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The sun was shining brightly\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "brightly"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The trees were swaying in the wind\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "wind"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I went to the park\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "park"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The grass was so green\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "green"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I saw a couple walking hand in hand\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "hand"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"They were smiling and laughing\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "laughing"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"It was a lovely sight\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sight"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I watched them for a while\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "while"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"There was a fountain nearby\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "nearby"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The water was sparkling in the sunlight\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sunlight"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I decided to take a walk\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "walk"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"The path was wide and lined with trees\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "trees"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 4th word in \"I passed a playground\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "playground"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"Children were playing and laughing\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "laughing"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I could hear their joyful voices\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "voices"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The path was leading to a lake\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "lake"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The lake was calm and peaceful\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "peaceful"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could see ducks swimming in the water\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "water"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The sun was setting behind the hills\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "hills"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"The sky was painted in shades of pink and orange\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "orange"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The air was cool and refreshing\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "refreshing"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I smelled the fragrant flowers\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "flowers"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I continued walking and soon came to a bridge\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "bridge"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"The bridge was arched and made of stone\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "stone"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"It was a beautiful sight\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sight"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I stopped to take a picture\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "picture"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The sun was low in the sky\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sky"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I could see the stars twinkling in the dark\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "dark"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The night was silent and still\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "still"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could hear the sound of the wind\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "wind"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"I looked up at the sky and saw the moon\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "moon"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"It was a beautiful sight\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sight"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I stayed for a while and watched the stars\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "stars"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The night was peaceful and serene\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "serene"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"I took a deep breath and felt the fresh air\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "air"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I started to feel relaxed and at peace\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "peace"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"I decided to go home\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "home"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I walked back slowly and felt content\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "content"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I enjoyed my walk and the peaceful night\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "night"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"I made my way back home and went to sleep\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sleep"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I had a wonderful day and I was happy\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "happy"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I woke up early the next day\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "day"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I could hear the birds chirping outside\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "outside"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The sun was rising in the sky\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "sky"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"The sky was a beautiful shade of blue\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "blue"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I went outside and took a deep breath\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "breath"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The air was cool and fresh\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "fresh"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could smell the flowers in the garden\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "garden"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I started my day with a walk\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "walk"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I followed the path along the river\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "river"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The river was flowing gently\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "gently"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I could see the trees swaying in the breeze\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "breeze"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The sun was shining brightly\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "brightly"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"I stopped to watch the ducks swimming in the water\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "water"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could hear the sound of the river\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "river"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"The path led me to a meadow\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "meadow"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The grass was lush and green\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "green"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 4th word in \"The wildflowers were blooming\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "blooming"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could see butterflies fluttering in the air\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "air"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I continued walking and soon came to a bridge\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "bridge"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"It was an old wooden bridge\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "bridge"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I stopped to admire the view\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "view"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 5th word in \"The river was flowing underneath\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "underneath"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could see the mountains in the distance\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "distance"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 15th word in \"The sun was setting and the sky was painted in shades of orange and pink\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "pink"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I watched the sunset until it disappeared\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "disappeared"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I turned around and started to walk back\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "back"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The night was silent and still\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "still"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I could hear the sound of the crickets\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "crickets"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 11th word in \"I looked up at the sky and saw a million stars\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "stars"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I felt so small and insignificant\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "insignificant"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"The night was peaceful and I felt content\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "content"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 8th word in \"I went back home and went to bed\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "bed"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I had a wonderful day and I was happy\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "happy"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I woke up early the next morning\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "morning"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"The sun was shining and the birds were chirping\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "chirping"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"I took a deep breath and enjoyed the fresh air\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "air"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The sky was a bright blue\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "blue"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"I got ready and went outside\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "outside"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I decided to go for a walk\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "walk"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 6th word in \"The streets were quiet and peaceful\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "peaceful"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 11th word in \"I looked around and saw the trees swaying in the wind\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "wind"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 11th word in \"The flowers were blooming and the grass was a vibrant green\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "green"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 9th word in \"I kept walking and soon came to a park\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "park"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 10th word in \"The park was full of people enjoying the sunny day\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "day"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 7th word in \"I saw children running around and playing\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "playing"}
{"input": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the 11th word in \"I watched them for a while and then continued my walk\"? Answer directly with the word, make no additional comment, don't use quotation marks."}], "ideal": "walk"}
ChatGPT (GPT-3) result is, to phrase it friendly, interesting 😄

@Ein-Tim That must be taken into account indeed, but in the samples.json file I used, I explicitly asked it to only answer with a single word
@Ein-Tim Thanks! You made me want to double check it and I realized it would also attempt to enclose it in quotation marks, so I asked it explicitly not to do that
Based on eval logs I can confirm gpt-3.5-turbo follows the instructions by only outputting a single word