autogen icon indicating copy to clipboard operation
autogen copied to clipboard

[Bug][AutogenBench]: import error in HumanEval test script

Open LinxinS97 opened this issue 11 months ago • 1 comments

Describe the bug

The current HumanEval test is to replace a placeholder in coding/mytest.py with the test script in the HumanEval dataset. However, this will cause an import error, which prevents some specific problem from being solved and always outputs "SOME TESTS FAILED - TRY AGAIN !#!#".

For example, the following script is from HumanEval_38:

### Code in the prompt
from my_test import run_tests

def encode_cyclic(s: str):
    """
    returns encoded string by cycling groups of three characters.
    """
    # split string to groups. Each of length 3.
    groups = [s[(3 * i):min((3 * i + 3), len(s))] for i in range((len(s) + 2) // 3)]
    # cycle elements in each group. Unless group has fewer elements than 3.
    groups = [(group[1:] + group[0]) if len(group) == 3 else group for group in groups]
    return "".join(groups)


def decode_cyclic(s: str):
    """
    takes as input string encoded with encode_cyclic function. Returns decoded string.
    """

run_test(decode_cyclic)

where a function encode_cyclic is pre-defined for problem-solving. However, the run_test function will also use the encode_cyclic function, which is not imported correctly and will lead to the test failure.

### Code in coding/my_test.py
METADATA = {}

def check(candidate):
    from random import randint, choice
    import string

    letters = string.ascii_lowercase
    for _ in range(100):
        str = ''.join(choice(letters) for i in range(randint(10, 20)))
        encoded_str = encode_cyclic(str)  # encode_cyclic cannot be found
        assert candidate(encoded_str) == str

def run_tests(candidate):
    try:
        check(candidate)
        print("ALL TESTS PASSED !#!#\nTERMINATE")
    except:
        print("SOME TESTS FAILED - TRY AGAIN !#!#")

Steps to reproduce

Run the No.38 problem in HumanEval.

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

LinxinS97 avatar Mar 20 '24 08:03 LinxinS97

Good catch. Thanks for reporting

afourney avatar Mar 20 '24 13:03 afourney