evals icon indicating copy to clipboard operation
evals copied to clipboard

ARC challenge

Open bhack opened this issue 1 year ago • 8 comments

Is it possible to add ARC challenge (1 or 2) jsons to the eval?

https://lab42.global/essay-arc/ https://lab42.global/wp-content/uploads/2022/08/ARC-800-tasks.zip https://github.com/fchollet/ARC

bhack avatar Mar 14 '23 17:03 bhack

To add ARC Challenge JSON files to your evaluation, follow these steps:

Download the files: Download the ARC-800-tasks.zip file from https://lab42.global/wp-content/uploads/2022/08/ARC-800-tasks.zip Extract the contents of the zip file to a folder.

Clone the ARC repository by François Chollet: Run the following command in your terminal or command prompt: bash Copy code git clone https://github.com/fchollet/ARC.git

This will create a local copy of the ARC repository on your machine.

Prepare the data: You'll find the JSON files for ARC Challenge 1 and 2 inside the extracted zip file from step 1. You may want to preprocess the JSON files to match the format expected by your evaluation script. This step depends on the specific input format your evaluation script is designed to handle.

Modify the evaluation script: Open the evaluation script you want to use with the ARC dataset. Ensure that the script reads the JSON files from the appropriate folder. If necessary, adjust the data processing and evaluation steps in the script to work with the ARC dataset format.

Run the evaluation: Execute the modified evaluation script. It should now process the ARC dataset and output the results.

### WHAT TO DO: Download the ARC-800-tasks.zip file and extract its contents. Clone François Chollet's ARC repository to your local machine. Inspect the JSON files in the extracted folder to understand their structure. Modify your evaluation script to read and process the ARC dataset JSON files. This may involve preprocessing the files to match the format expected by your script. Adjust data processing and evaluation steps in the script to work with the ARC dataset format. Run the modified evaluation script, which should now process the ARC dataset and produce results.

Kuonirad avatar Mar 15 '23 00:03 Kuonirad

https://github.com/openai/evals/pull/317

bhack avatar Mar 18 '23 08:03 bhack

@bhack thanks for tagging my PR here, I hadn't actually seen this issue yet!

So yes, I implemented it there and I also have a model grader for tensor equality checking I can contribute if it's desired.

mkirch avatar Mar 18 '23 16:03 mkirch

https://github.com/openai/evals/pull/417

bhack avatar May 03 '23 12:05 bhack

@bhack @theo3 We should revisit this, as there’s been a lot done since that PR was submitted

mkirch avatar May 04 '23 22:05 mkirch

@andrew-openai Do we want a separate ticket for ConceptARC?

https://arxiv.org/abs/2305.07141 https://aiguide.substack.com/p/on-evaluating-understanding-and-generalization

bhack avatar May 16 '23 10:05 bhack

https://github.com/victorvikram/ConceptARC

/cc @victorvikram

bhack avatar May 16 '23 10:05 bhack

@andrew-openai Can we close this?

bhack avatar Jun 01 '23 23:06 bhack

Looks like this is completed in #417 so i'm going to close

etr2460 avatar Dec 05 '23 21:12 etr2460

https://github.com/michaelhodel/re-arc

bhack avatar Apr 12 '24 01:04 bhack