bigcodebench
bigcodebench copied to clipboard
🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response
trafficstars
EvalPlus version
v0_1_0_hf
Output of running ls ~/.cache/bigcodebench
BigCodeBench-v0.1.0_hf.jsonl
Task ID of the programming task
BigCodeBench/211, BigCodeBench/215, probably some others as well
The original test
(All tests)
mock_response = MagicMock()
mock_response.content = MOCK_CONTENT
mock_requests_get.return_value = mock_response
Your proposed new test
mock_response = MagicMock()
mock_response.content = MOCK_CONTENT
mock_response.status_code = 200
mock_requests_get.return_value = mock_response
Description
The LLM sometimes (reasonably!) generates code like:
if r.status_code != 200:
print("Error: Failed to download file from URL.")
return None
(Rest of code solves task correctly)
But fails the test
Other context
No response
Thanks @dmelcer9! It makes sense :) We didn't think about this when developing the initial tasks. We will incorporate this change in the next dataset release.
@dmelcer9 which model did you use? I'd like to verify resolution in #49.
Not 100% sure but I believe this was with Starcoder2-15b, temperature was somewhere between 0.7 and 1.