bigcodebench icon indicating copy to clipboard operation
bigcodebench copied to clipboard

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response

Open dmelcer9 opened this issue 1 year ago • 3 comments
trafficstars

EvalPlus version

v0_1_0_hf

Output of running ls ~/.cache/bigcodebench

BigCodeBench-v0.1.0_hf.jsonl

Task ID of the programming task

BigCodeBench/211, BigCodeBench/215, probably some others as well

The original test

(All tests)
mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_requests_get.return_value = mock_response

Your proposed new test

mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_response.status_code = 200
mock_requests_get.return_value = mock_response

Description

The LLM sometimes (reasonably!) generates code like:

    if r.status_code != 200:
        print("Error: Failed to download file from URL.")
        return None

   (Rest of code solves task correctly)

But fails the test

Other context

No response

dmelcer9 avatar Jul 26 '24 14:07 dmelcer9

Thanks @dmelcer9! It makes sense :) We didn't think about this when developing the initial tasks. We will incorporate this change in the next dataset release.

terryyz avatar Jul 26 '24 17:07 terryyz

@dmelcer9 which model did you use? I'd like to verify resolution in #49.

hvaara avatar Sep 14 '24 22:09 hvaara

Not 100% sure but I believe this was with Starcoder2-15b, temperature was somewhere between 0.7 and 1.

dmelcer9 avatar Sep 16 '24 13:09 dmelcer9