composer
composer copied to clipboard
Execution Prediction
What does this PR do?
This PR introduces the execution prediction task. It is an auxiliary task compatible with any code evaluation dataset that requires the model to inspect a piece of code, and complete assert test statements by predicting the code's output on a given input.
Tested with this run: exec-prediction-has8Hy
It is slightly more challenging than human eval but still meaningful signal for 30B models.
| Category | Benchmark | Subtask | Accuracy | Number few shot | Model |
|:-----------|:--------------------------------|:----------|-----------:|:------------------|:-------------------------|
| | human_eval_execution_prediction | | 0.170163 | 3-shot | mosaicml/mpt-7b-instruct |
Below is an example of how the execution prediction task is formatted:
"""
Below is a list of python functions each followed by a correct assert statement testing its behavior. The final assert statement is incomplete; your task is to complete the final assert statement so that it passes.
"""
####
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
""" Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True
"""
for idx, elem in enumerate(numbers):
for idx2, elem2 in enumerate(numbers):
if idx!= idx2:
distance = abs(elem - elem2)
if distance < threshold:
return True
return False
def test0():
assert has_close_elements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False
####
from typing import List
def filter_by_substring(strings: List[str], substring: str) -> List[str]:
""" Filter an input list of strings only for ones that contain given substring
>>> filter_by_substring([], 'a')
[]
>>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')
['abc', 'bacd', 'array']
"""
return [x for x in strings if substring in x]
def test1():
assert filter_by_substring(["grunt", "trumpet", "prune", "gruesome"], "run") == ['grunt', 'prune']
####
from typing import List
def separate_paren_groups(paren_string: str) -> List[str]:
""" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
separate those group into separate strings and return the list of those.
Separate groups are balanced (each open brace is properly closed) and not nested within each other
Ignore any spaces in the input string.
>>> separate_paren_groups('( ) (( )) (( )( ))')
['()', '(())', '(()())']
"""
result = []
current_string = []
current_depth = 0
for c in paren_string:
if c == '(':
current_depth += 1
current_string.append(c)
elif c == ')':
current_depth -= 1
current_string.append(c)
if current_depth == 0:
result.append(''.join(current_string))
current_string.clear()
return result
def test():
assert separate_paren_groups("(()()) ((())) () ((())()())") ==
The model would then be expected to continue the line such that the test
function succeeds.
What issue(s) does this change relate to?
Before submitting
- [ ] Have you read the contributor guidelines?
- [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
- [ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
- [ ] Did you update any related docs and document your change?
- [ ] Did you update any related tests and add any new tests related to your change? (see testing)
- [ ] Did you run the tests locally to make sure they pass?
- [ ] Did you run
pre-commit
on your change? (see thepre-commit
section of prerequisites)