gemini-benchmark icon indicating copy to clipboard operation
gemini-benchmark copied to clipboard

Results 2 gemini-benchmark issues
Sort by recently updated
recently updated
newest added

The results of filtering final strings in codegen is a little bit buggy (I made a temporary patch to get results quickly, but it should be fixed permanently). https://github.com/neulab/gemini-benchmark/blob/fe7a80c9f4423bdca529dfd18691d060f5d61e6e/benchmarking/Code/run_code.py#L126-L130

Hi! Thanks for this great and extensive benchmarking work. I was looking at Section 6 in the paper and found this graph intriguing since it states that `gpt3.5-turbo` is better...