composer
composer copied to clipboard
Add code eval dataset and metric
What does this PR do?
Provide ICL datasets and metrics for code evaluation, along with unit tests.
What issue(s) does this change relate to?
Manual test: `****************************** Config: node_name: a100-40sxm-h15-02 num_gpus_per_node: 4 num_nodes: 1 rank_zero_seed: 2021406292
[Eval batch=1/11] Eval on humaneval/0-shot data [Eval batch=2/11] Eval on humaneval/0-shot data [Eval batch=3/11] Eval on humaneval/0-shot data [Eval batch=4/11] Eval on humaneval/0-shot data [Eval batch=5/11] Eval on humaneval/0-shot data [Eval batch=6/11] Eval on humaneval/0-shot data [Eval batch=7/11] Eval on humaneval/0-shot data [Eval batch=8/11] Eval on humaneval/0-shot data [Eval batch=9/11] Eval on humaneval/0-shot data [Eval batch=10/11] Eval on humaneval/0-shot data /mnt/workdisk/rishab/composer/composer/core/data_spec.py:35: UserWarning: Cannot split tensor of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size. warnings.warn(f'Cannot split tensor of length {len(t)} into batches of size {microbatch_size}. ' /mnt/workdisk/rishab/composer/composer/core/data_spec.py:26: UserWarning: Cannot split list of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size. warnings.warn(f'Cannot split list of length {len(l)} into batches of size {microbatch_size}. ' [Eval batch=11/11] Eval on humaneval/0-shot data: Eval metrics/humaneval/0-shot/InContextLearningCodeEvalAccuracy: 0.1159 Ran eval in: 846.955498456955 seconds metrics/humaneval/0-shot/InContextLearningCodeEvalAccuracy: 0.11585365980863571`