gemini-benchmark
gemini-benchmark copied to clipboard
Double-check filtering of final strings in codegen
The results of filtering final strings in codegen is a little bit buggy (I made a temporary patch to get results quickly, but it should be fixed permanently). https://github.com/neulab/gemini-benchmark/blob/fe7a80c9f4423bdca529dfd18691d060f5d61e6e/benchmarking/Code/run_code.py#L126-L130