evals
evals copied to clipboard
ARC challenge
Is it possible to add ARC challenge (1 or 2) jsons to the eval?
https://lab42.global/essay-arc/ https://lab42.global/wp-content/uploads/2022/08/ARC-800-tasks.zip https://github.com/fchollet/ARC
To add ARC Challenge JSON files to your evaluation, follow these steps:
Download the files: Download the ARC-800-tasks.zip file from https://lab42.global/wp-content/uploads/2022/08/ARC-800-tasks.zip Extract the contents of the zip file to a folder.
Clone the ARC repository by François Chollet: Run the following command in your terminal or command prompt: bash Copy code git clone https://github.com/fchollet/ARC.git
This will create a local copy of the ARC repository on your machine.
Prepare the data: You'll find the JSON files for ARC Challenge 1 and 2 inside the extracted zip file from step 1. You may want to preprocess the JSON files to match the format expected by your evaluation script. This step depends on the specific input format your evaluation script is designed to handle.
Modify the evaluation script: Open the evaluation script you want to use with the ARC dataset. Ensure that the script reads the JSON files from the appropriate folder. If necessary, adjust the data processing and evaluation steps in the script to work with the ARC dataset format.
Run the evaluation: Execute the modified evaluation script. It should now process the ARC dataset and output the results.
### WHAT TO DO: Download the ARC-800-tasks.zip file and extract its contents. Clone François Chollet's ARC repository to your local machine. Inspect the JSON files in the extracted folder to understand their structure. Modify your evaluation script to read and process the ARC dataset JSON files. This may involve preprocessing the files to match the format expected by your script. Adjust data processing and evaluation steps in the script to work with the ARC dataset format. Run the modified evaluation script, which should now process the ARC dataset and produce results.
https://github.com/openai/evals/pull/317
@bhack thanks for tagging my PR here, I hadn't actually seen this issue yet!
So yes, I implemented it there and I also have a model grader for tensor equality checking I can contribute if it's desired.
https://github.com/openai/evals/pull/417
@bhack @theo3 We should revisit this, as there’s been a lot done since that PR was submitted
@andrew-openai Do we want a separate ticket for ConceptARC
?
https://arxiv.org/abs/2305.07141 https://aiguide.substack.com/p/on-evaluating-understanding-and-generalization
https://github.com/victorvikram/ConceptARC
/cc @victorvikram
@andrew-openai Can we close this?
Looks like this is completed in #417 so i'm going to close
https://github.com/michaelhodel/re-arc