Add Simple General Knowledge Eval
Thank you for contributing an eval! β₯οΈ
π¨ Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. π¨
Eval details π
Eval name
General Knowledge
Eval description
Tests the model's ability to concisely answer some simple general knowledge questions.
What makes this a useful eval?
The model should be able to answer questions concisely when requested by the user instead of over explaining it.
Criteria for a good eval β
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- [x] Includes good signal around what is the right behavior. This means either a correct answer for
Basicevals or theFactModel-graded eval, or an exhaustive rubric for evaluating answers for theCriteriaModel-graded eval. - [x] Include at least 100 high quality examples
If there is anything else that makes your eval worth including, please document it below.
Unique eval value
Insert what makes your eval high quality that was not mentioned above. (Not required)
Eval structure ποΈ
Your eval should
- [x] Check that your data is in
evals/registry/data/{name} - [x] Check that your yaml is registered at
evals/registry/evals/{name}.jsonl - [x] Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
Final checklist π
Submission agreement
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
Email address validation
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.
- [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
Limited availability acknowledgement
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.
Submit eval
- [x] I have filled out all required fields in the evals PR form
- [ ] (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit installand have verified thatblack,isort, andautoflakeare running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Eval JSON data
Since we are using Git LFS, we are asking eval submitters to add in their first 100 JSONL eval lines.
View evals in JSON
Eval
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When did the United States become the first nation to land a man on the moon?"}], "ideal": "1969"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "How many articles are there in the Constitution of the United States?"}], "ideal": "7"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the President of the United States from August 1974 to January 1977?"}], "ideal": "Gerald Rudolph Ford Jr."}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which country is Lake Huron located in?"}], "ideal": "United States of America"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Herbert Hoover belonged to which political party?"}], "ideal": "Republican"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Grover Cleveland belonged to which political party in US?"}], "ideal": "Democratic"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Name the national bird and national animal of the United States of America?"}], "ideal": "The Bald Eagle"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What was the worldβs first artificial Satellite?"}], "ideal": "Sputnik I"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Where is the city of Hobbs located in the United States of America?"}], "ideal": "New Mexico"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Where is Borah Peak located in the United States of America?"}], "ideal": "Idaho"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the total length of the Rocky Mountains?"}], "ideal": "3,000 mi"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which is the Highest point in the Rocky Mountains?"}], "ideal": "Mount Elbert"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is currently the largest bill in US Currency?"}], "ideal": "$100 bill"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was the American Revolutionary War fought?"}], "ideal": "1775 to 1783"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "According to 2018 reports, what was the population of United States of America?"}], "ideal": "32,75,42,690"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is current Vice President of the United States?"}], "ideal": "Kamala Harris"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Where is the National Air and Space Museum in the United States of America?"}], "ideal": "Washington, DC"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was Edwin Hubble in the United States?"}], "ideal": "An Astronomer"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was the Golden Gate Bridge construction started on?"}], "ideal": "5 January 1933"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who discovered the Golden Delicious tree?"}], "ideal": "Anderson Mullins"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who discovered Pluto?"}], "ideal": "Clyde Tombaugh"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is 46th President of the United States?"}], "ideal": "Joe Biden"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "According to land area, on which position does the United States stand in the world?"}], "ideal": "Third"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the capital of Colorado in the United States?"}], "ideal": "Denver"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the height of the Golden Gate Bridge?"}], "ideal": "227m"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the total length of the Golden Gate Bridge?"}], "ideal": "2,737"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the length of the Missouri river?"}], "ideal": "2,341 miles"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Name the national flower of United States of America?"}], "ideal": "Rose"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which is the largest city in Washington?"}], "ideal": "Seattle"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is the current secretary of state in United States of America?"}], "ideal": "Rex W. Tillerson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was Serena Williams born?"}], "ideal": "26 September 1981"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who founded Philadelphia?"}], "ideal": "William Penn"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Where is Mauna Loa Volcano"}], "ideal": "Hawaii"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When were the 1st Emmy Awards presented in the United States of America?"}], "ideal": "January 25, 1949"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was the United States flag adopted?"}], "ideal": "June 14, 1777"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the first President of the United States?"}], "ideal": "George Washington"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the capital of California?"}], "ideal": "Sacramento"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which state is known as the Sunflower State?"}], "ideal": "Kansas"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the largest state in the United States by land area?"}], "ideal": "Alaska"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which state is known as the Lone Star State?"}], "ideal": "Texas"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the name of the national anthem of the United States?"}], "ideal": "The Star-Spangled Banner"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the first woman to serve on the United States Supreme Court?"}], "ideal": "Sandra Day OConnor"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the largest city in the United States by population?"}], "ideal": "New York City"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who invented the telephone?"}], "ideal": "Alexander Graham Bell"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "In what year did the United States enter World War II?"}], "ideal": "1941"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the first African American President of the United States?"}], "ideal": "Barack Obama"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which amendment to the US Constitution abolished slavery?"}], "ideal": "13th Amendment"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who assassinated President John F. Kennedy?"}], "ideal": "Lee Harvey Oswald"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who wrote the Declaration of Independence?"}], "ideal": "Thomas Jefferson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the smallest state in the United States?"}], "ideal": "Rhode Island"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the first American to orbit the Earth?"}], "ideal": "John Glenn"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the name of the highest mountain in North America?"}], "ideal": "Denali"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which US president signed the Civil Rights Act of 1964 into law?"}], "ideal": "Lyndon B. Johnson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the name of the largest desert in North America?"}], "ideal": "Great Basin Desert"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is credited with inventing the light bulb?"}], "ideal": "Thomas Edison"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When is the day of independence?"}], "ideal": "4th July"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the length of the Grand Canyon?"}], "ideal": "277 miles"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is the 40th governor of California?"}], "ideal": "Gavin Newsom."}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the 3rd president of United States of America?"}], "ideal": "Thomas Jefferson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the total length of Erie Canal?"}], "ideal": "524 miles"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was Barack Obama born?"}], "ideal": "4 August 1961"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who had the longest tenure as President of USA?"}], "ideal": "Franklin D. Roosevelt"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the fourth president of united states of America?"}], "ideal": "James Madison"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the number of judges in Federal Supreme Court?"}], "ideal": "Nine"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When the constitution of America was adopted?"}], "ideal": "1788"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the total length of the Red River in United States?"}], "ideal": "1,360 miles"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which city is known as the Big Apple?"}], "ideal": "New York City"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which country has the largest economy National GDP?"}], "ideal": "United States of America"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When the war between USA and Spain was fought?"}], "ideal": "1898"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the currency of the USA?"}], "ideal": "United States Dollar USD"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "How many stars are in the flag of USA?"}], "ideal": "50"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "How many states are there in the US?"}], "ideal": "50"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the civil rights leader who fought through nonviolent action?"}], "ideal": "Martin Luther King Jr."}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was the first person to walk on the moon?"}], "ideal": "Neil Armstrong"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What was the name of the ship that brought the Pilgrins to New England in 1620?"}], "ideal": "Mayflower"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Where's the White House located?"}], "ideal": "Washington, D.C."}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What organization tries to find solutions to world problems and disputes?"}], "ideal": "The United Nations"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What does IMF stand for?"}], "ideal": "International Monetary Fund"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who said: We hold these truths to be self evident that all men are created equal?"}], "ideal": "Thomas Jefferson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "When was the nineteenth amendment to the Constitution giving women the right to vote ratified?"}], "ideal": "1920"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who ran against George W. Bush in the Presidential election of 2000?"}], "ideal": "Al Gore"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the most well-known canyon in the United States?"}], "ideal": "Grand Canyon"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who was President of the U.S. during the Civil War?"}], "ideal": "Abraham Lincoln"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What does the Hershey company make?"}], "ideal": "Chocolate"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What does the S stand for in ESPN?"}], "ideal": "Sports"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What American city has the largest casino industry?"}], "ideal": "Las Vegas"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the last name of Homer, Marge, Bart, Maggie, and Lisa?"}], "ideal": "Simpson"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What is the city with the largest population in Illinois?"}], "ideal": "Chicago"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which American auto companies are known as the Big 3?"}], "ideal": "General Motors, Ford and Chrysler"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who created the cartoon character Mickey Mouse?"}], "ideal": "Walt Disney"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What sport is played at Pebble Beach?"}], "ideal": "Golf"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "During the Civil War, Union soldiers wore blue. What color did rebel soldiers wear?"}], "ideal": "Gray"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What country attacked the United States on a day that will live in infamy?"}], "ideal": "Japan"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "In what board game can you trade in 4 houses for a hotel?"}], "ideal": "Monopoly"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What widely-disliked company is the largest provider of cable TV in the US?"}], "ideal": "Comcast"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "What product was introduced by Steve Jobs in 2007?"}], "ideal": "iPhone"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is all-time leading scorer for the Cleveland Cavaliers?"}], "ideal": "LeBron James"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Who is considered the discoverer of America in US and European popular culture?"}], "ideal": "Cristopher Columbus"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which laws mandated racial segregation in all public facilities in the states of the former Confederate States of America?"}], "ideal": "Jim Crow laws"}
{"input": [{"role": "system", "content": "Answer the following questions as concisely as possible."}, {"role": "user", "content": "Which battle is considered the turning point of the American Revolution?"}], "ideal": "Battle of Saratoga"}
I'll actively add the rest of the examples. I seem to have missed that requirement earlier. My bad.
Awesome! Keep in mind, GPT-4 has extensive knowledge of info so even if GPT-3.5 fails on this, GPT-4 might be better.
Updated with more examples. Hope this helps. π
Took the liberty to run some evals. Here's what I got:
-
{'accuracy': 0.4752475247524752} -
{'accuracy': 0.46534653465346537} -
{'accuracy': 0.4752475247524752} -
{'accuracy': 0.4752475247524752} -
{'accuracy': 0.4752475247524752}
GPT-4 has a lower accuracy somehow. Ran 3 tests (can't afford more) and got the following accuracies:
-
accuracy: 0.27722772277227725 -
accuracy: 0.26732673267326734 -
accuracy: 0.26732673267326734
and this time I actually cleared the file cache before running the evals again.
Understood. I'll make the requested changes and get back to ya. I was also confused regarding this cause of the lower accuracy with GPT-4.
Closing due to inactivity, feel free to reopen when comments are addressed!