Execution Prediction

Open bmosaicml opened this issue 1 year ago • 3 comments

What does this PR do?

This PR introduces the execution prediction task. It is an auxiliary task compatible with any code evaluation dataset that requires the model to inspect a piece of code, and complete assert test statements by predicting the code's output on a given input.

Tested with this run: exec-prediction-has8Hy

It is slightly more challenging than human eval but still meaningful signal for 30B models.

| Category   | Benchmark                       | Subtask   |   Accuracy | Number few shot   | Model                    |
|:-----------|:--------------------------------|:----------|-----------:|:------------------|:-------------------------|
|            | human_eval_execution_prediction |           |   0.170163 | 3-shot            | mosaicml/mpt-7b-instruct |

Below is an example of how the execution prediction task is formatted:

"""
Below is a list of python functions each followed by a correct assert statement testing its behavior. The final assert statement is incomplete; your task is to complete the final assert statement so that it passes.
"""

####

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx!= idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False


def test0():
        assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False

####

from typing import List


def filter_by_substring(strings: List[str], substring: str) -> List[str]:
    """ Filter an input list of strings only for ones that contain given substring
    >>> filter_by_substring([], 'a')
    []
    >>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')
    ['abc', 'bacd', 'array']
    """
    return [x for x in strings if substring in x]


def test1():
        assert filter_by_substring(["grunt", "trumpet", "prune", "gruesome"], "run") == ['grunt', 'prune']

####

from typing import List


def separate_paren_groups(paren_string: str) -> List[str]:
    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
    separate those group into separate strings and return the list of those.
    Separate groups are balanced (each open brace is properly closed) and not nested within each other
    Ignore any spaces in the input string.
    >>> separate_paren_groups('( ) (( )) (( )( ))')
    ['()', '(())', '(()())']
    """
    result = []
    current_string = []
    current_depth = 0

    for c in paren_string:
        if c == '(':
            current_depth += 1
            current_string.append(c)
        elif c == ')':
            current_depth -= 1
            current_string.append(c)

            if current_depth == 0:
                result.append(''.join(current_string))
                current_string.clear()

    return result


def test():
        assert separate_paren_groups("(()()) ((())) () ((())()())") ==

The model would then be expected to continue the line such that the test function succeeds.

What issue(s) does this change relate to?

Before submitting

[ ] Have you read the contributor guidelines?
[ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
[ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
[ ] Did you update any related docs and document your change?
[ ] Did you update any related tests and add any new tests related to your change? (see testing)
[ ] Did you run the tests locally to make sure they pass?
[ ] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

Oct 19 '23 20:10 bmosaicml

composer composer copied to clipboard

Execution Prediction

What does this PR do?

What issue(s) does this change relate to?

Before submitting

composer
composer copied to clipboard