gemini-benchmark issues

Results 2 gemini-benchmark issues

Sort by recently updated

Double-check filtering of final strings in codegen

The results of filtering final strings in codegen is a little bit buggy (I made a temporary patch to get results quickly, but it should be fixed permanently). https://github.com/neulab/gemini-benchmark/blob/fe7a80c9f4423bdca529dfd18691d060f5d61e6e/benchmarking/Code/run_code.py#L126-L130

neubig

Code Generation Evals should parse code from LM response

Hi! Thanks for this great and extensive benchmarking work. I was looking at Section 6 in the paper and found this graph intriguing since it states that `gpt3.5-turbo` is better...

manishshettym

gemini-benchmark
gemini-benchmark copied to clipboard

Metadata

Double-check filtering of final strings in codegen

Code Generation Evals should parse code from LM response

← Metadata

Owner

Metadata

gemini-benchmark gemini-benchmark copied to clipboard

Metadata

Double-check filtering of final strings in codegen

Code Generation Evals should parse code from LM response

← Metadata

Owner

Metadata

gemini-benchmark
gemini-benchmark copied to clipboard