llm-foundry
llm-foundry copied to clipboard
eval.py error while benchmarking T5
Console
[Eval batch=1/1289] Eval on lambada_openai/0-shot data
[Eval batch=130/1289] Eval on lambada_openai/0-shot data
[Eval batch=259/1289] Eval on lambada_openai/0-shot data
[Eval batch=387/1289] Eval on lambada_openai/0-shot data
[Eval batch=516/1289] Eval on lambada_openai/0-shot data
[Eval batch=645/1289] Eval on lambada_openai/0-shot data
[Eval batch=774/1289] Eval on lambada_openai/0-shot data
[Eval batch=903/1289] Eval on lambada_openai/0-shot data
[Eval batch=1031/1289] Eval on lambada_openai/0-shot data
[Eval batch=1160/1289] Eval on lambada_openai/0-shot data
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:35: UserWarning: Cannot split tensor of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split tensor of length {len(t)} into batches of size {microbatch_size}. '
/home/codeless/Desktop/llm-foundry/mosaic/lib/python3.10/site-packages/composer/core/data_spec.py:26: UserWarning: Cannot split list of length 1 into batches of size 4. As it is smaller, no splitting will be done. This may happen on the last batch of a dataset if it is a smaller size than the microbatch size.
warnings.warn(f'Cannot split list of length {len(l)} into batches of size {microbatch_size}. '
[Eval batch=1289/1289] Eval on lambada_openai/0-shot data
[Eval batch=1/919] Eval on piqa/10-shot data
[Eval batch=93/919] Eval on piqa/10-shot data
[Eval batch=185/919] Eval on piqa/10-shot data
[Eval batch=276/919] Eval on piqa/10-shot data
[Eval batch=368/919] Eval on piqa/10-shot data
[Eval batch=460/919] Eval on piqa/10-shot data
[Eval batch=552/919] Eval on piqa/10-shot data
[Eval batch=644/919] Eval on piqa/10-shot data
[Eval batch=735/919] Eval on piqa/10-shot data
[Eval batch=827/919] Eval on piqa/10-shot data
[Eval batch=919/919] Eval on piqa/10-shot data
[Eval batch=1/10042] Eval on hellaswag/10-shot data
[Eval batch=1005/10042] Eval on hellaswag/10-shot data
[Eval batch=2009/10042] Eval on hellaswag/10-shot data
[Eval batch=3013/10042] Eval on hellaswag/10-shot data
[Eval batch=4017/10042] Eval on hellaswag/10-shot data
[Eval batch=5022/10042] Eval on hellaswag/10-shot data
[Eval batch=6026/10042] Eval on hellaswag/10-shot data
[Eval batch=7030/10042] Eval on hellaswag/10-shot data
[Eval batch=8034/10042] Eval on hellaswag/10-shot data
[Eval batch=9038/10042] Eval on hellaswag/10-shot data
[Eval batch=10042/10042] Eval on hellaswag/10-shot data
[Eval batch=1/2376] Eval on arc_easy/10-shot data
[Eval batch=238/2376] Eval on arc_easy/10-shot data
[Eval batch=476/2376] Eval on arc_easy/10-shot data
[Eval batch=714/2376] Eval on arc_easy/10-shot data
[Eval batch=951/2376] Eval on arc_easy/10-shot data
[Eval batch=1188/2376] Eval on arc_easy/10-shot data
[Eval batch=1426/2376] Eval on arc_easy/10-shot data
[Eval batch=1664/2376] Eval on arc_easy/10-shot data
[Eval batch=1901/2376] Eval on arc_easy/10-shot data
[Eval batch=2138/2376] Eval on arc_easy/10-shot data
[Eval batch=2376/2376] Eval on arc_easy/10-shot data
[Eval batch=1/1172] Eval on arc_challenge/10-shot data
[Eval batch=118/1172] Eval on arc_challenge/10-shot data
[Eval batch=235/1172] Eval on arc_challenge/10-shot data
[Eval batch=352/1172] Eval on arc_challenge/10-shot data
[Eval batch=469/1172] Eval on arc_challenge/10-shot data
[Eval batch=586/1172] Eval on arc_challenge/10-shot data
[Eval batch=704/1172] Eval on arc_challenge/10-shot data
[Eval batch=821/1172] Eval on arc_challenge/10-shot data
[Eval batch=938/1172] Eval on arc_challenge/10-shot data
[Eval batch=1055/1172] Eval on arc_challenge/10-shot data
[Eval batch=1172/1172] Eval on arc_challenge/10-shot data
[Eval batch=1/50] Eval on copa/0-shot data
[Eval batch=6/50] Eval on copa/0-shot data
[Eval batch=11/50] Eval on copa/0-shot data
[Eval batch=16/50] Eval on copa/0-shot data
[Eval batch=21/50] Eval on copa/0-shot data
[Eval batch=26/50] Eval on copa/0-shot data
[Eval batch=30/50] Eval on copa/0-shot data
[Eval batch=35/50] Eval on copa/0-shot data
[Eval batch=40/50] Eval on copa/0-shot data
[Eval batch=45/50] Eval on copa/0-shot data
[Eval batch=50/50] Eval on copa/0-shot data
[Eval batch=1/1635] Eval on boolq/10-shot data
[Eval batch=164/1635] Eval on boolq/10-shot data
[Eval batch=328/1635] Eval on boolq/10-shot data
[Eval batch=491/1635] Eval on boolq/10-shot data
[Eval batch=655/1635] Eval on boolq/10-shot data
[Eval batch=818/1635] Eval on boolq/10-shot data
[Eval batch=981/1635] Eval on boolq/10-shot data
[Eval batch=1145/1635] Eval on boolq/10-shot data
[Eval batch=1308/1635] Eval on boolq/10-shot data
[Eval batch=1472/1635] Eval on boolq/10-shot data
[Eval batch=1635/1635] Eval on boolq/10-shot data
Ran google/flan-t5-xl eval in: 13817.477584123611 seconds
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/codeless/Desktop/llm-foundry/scripts/eval/eval.py:252 in
To reproduce
I pip installed mosaicml and llm-foundry requirements yesterday, and ran the eval.py script on a flan-t5-xl model according to the quickstart guide. I only changed the max_seq_len, icl_seq_len to 512, model_name_or_path = google/flan-t5-xl, and model name to hf_t5, in hf_eval.yaml and tasks_light.yaml
Expected behavior
Successful benchmarking.
Additional context
I can't figure out why it couldn't find the key in the logger. I lack the experience to dig into it more, so I hope this info is enough for you guys to figure out what's wrong.
By the way, where is the benchmark results saved to?
cc: @bmosaicml who worked on the evaluation code, to take a look.