parsel Successful reproduction of the experiments on APPS by pure GPT3.5

Successful reproduction of the experiments on APPS by pure GPT3.5

Open wyt2000 opened this issue 1 year ago • 2 comments

Since Codex was deprecated by OpenAI, I tried to reproduce the experiments on the dataset APPS in Parsel paper by pure GPT3.5. Thanks to the code in branch saycan, I fully understood your evalutation method. After a tough struggling to modify the prompts and Parsel itself, I finally reproduced a part of experiments mentioned in chapter 3.1 of the paper and even got better results: the pure GPT-3.5 version parsel(8x16) solved 27 of 100 randomly sampled competition-level problem in APPS. I offer the modified code for someone to use in the future.

Sep 17 '23 12:09 wyt2000

Hey Yutong, could you share the modifications related to evaluations as well? I'm trying to reproduce the results on apps (27/100) according to your post.

Apr 24 '24 23:04 PatrickHua

Hey Yutong, could you share the modifications related to evaluations as well? I'm trying to reproduce the results on apps (27/100) according to your post.

Sorry, since a long time passed, I forgot many details about evaluations. See https://github.com/wyt2000/Automatic-ANPL/tree/apps for help.

May 30 '24 02:05 wyt2000

parsel parsel copied to clipboard

Successful reproduction of the experiments on APPS by pure GPT3.5

parsel
parsel copied to clipboard