Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
BigScience Eval Harness
Changes:
- Adds compatibility with the lm-evaluation-harness fork of BigScience here.
- Reduces the memory load by offloading to CPU earlier, thanks to @thomasw21!
Notes:
- Almost the same as the existing evaluate functionality, but some changes in the
.py
script as the bigscience fork has diverged form the original evaluation harness. :) - Sorry for the long commit history - I will squash when merging.
RE: Memory: micro_bs_multiplier=16, Now:
[default6]:Running loglikelihood requests
[default7]:Running loglikelihood requests
[default0]:
[default0]: 0%| | 0/6540 [00:00<?, ?it/s][default0]:
[default0]: 0%| | 16/6540 [00:04<31:39, 3.44it/s][default0]:
[default0]: 0%| | 32/6540 [00:20<1:15:48, 1.43it/s][default0]:
[default0]: 1%| | 48/6540 [00:29<1:09:21, 1.56it/s][default0]:
[default0]: 1%| | 64/6540 [00:36<1:01:34, 1.75it/s][default0]:
[default0]: 1%| | 80/6540 [00:43<55:39, 1.93it/s] [default0]:
[default0]: 1%| ^v^o | 96/6540 [00:50<51:09, 2.10it/s][default0]:
[default0]: 2%| ^v^o | 112/6540 [00:56<47:33, 2.25it/s][default0]:
[default0]: 2%| ^v^o | 128/6540 [01:01<44:25, 2.41it/s][default0]:
[default0]: 2%| ^v^o | 144/6540 [01:07<42:08, 2.53it/s][default0]:
[default0]: 2%| ^v^o | 160/6540 [01:12<40:23, 2.63it/s][default0]:
[default0]: 3%| ^v^n | 176/6540 [01:18<39:02, 2.72it/s][default0]:
[default0]: 3%| ^v^n | 192/6540 [01:23<37:54, 2.79it/s][default0]:
[default0]: 3%| ^v^n | 208/6540 [01:29<36:52, 2.86it/s][default0]:
[default0]: 3%| ^v^n | 224/6540 [01:34<36:00, 2.92it/s][default0]:
[default0]: 4%| ^v^n | 240/6540 [01:39<35:03, 3.00it/s][default0]:
[default0]: 4%| ^v^m | 256/6540 [01:44<34:21, 3.05it/s][default0]:
[default0]: 4%| ^v^m | 272/6540 [01:49<33:50, 3.09it/s][default0]:
[default0]: 4%| ^v^m | 288/6540 [01:54<33:21, 3.12it/s][default0]:
[default0]: 5%| ^v^m | 304/6540 [01:59<32:50, 3.17it/s]
Previously:
[default7]:}
[default7]:warning: provide_description is deprecated and will be removed in a future version in favor of description_dict
[default5]:Running loglikelihood requests
[default6]:Running loglikelihood requests
[default7]:Running loglikelihood requests
[default0]:
[default0]: 0%| | 0/6540 [00:00<?, ?it/s][default0]:
[default0]: 0%| | 16/6540 [00:04<31:16, 3.48it/s][default7]:Traceback (most recent call last):
[default7]: File "./tasks/eval_harness/evaluate.py", line 446, in <module>
[default7]: main()
[default7]: File "./tasks/eval_harness/evaluate.py", line 429, in main
[default7]: results = evaluator.evaluate(adaptor, {task_name: task}, False, 0, None, bootstrap_iters=args.bootstrap_iters)
[default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/eval/lm-evaluation-harness-thomas/lm_eval/utils.py", line 164, in _wrapper
[default7]: return fn(*args, **kwargs)
[default7]: File "/gpfsssd/worksf/projects/rech/six/commun/code/eval/lm-evaluation-harness-thomas/lm_eval/evaluator.py", line 247, in evaluate
[default7]: resps = getattr(lm, reqtype)([req.args for req in reqs])
[default7]: File "./tasks/eval_harness/evaluate.py", line 91, in loglikelihood
[default7]: return self._loglikelihood_tokens(new_reqs)
[default7]: File "./tasks/eval_harness/evaluate.py", line 157, in _loglikelihood_tokens
[default7]: logits = self._model_call(torch.cat(inps, dim=0))
[default7]: File "./tasks/eval_harness/evaluate.py", line 220, in _model_call
[default7]: output = torch.cat(output, 0)[:len(inps)]
[default7]:RuntimeError: CUDA out of memory. Tried to allocate 15.49 GiB (GPU 7; 79.17 GiB total capacity; 60.01 GiB already allocated; 13.71 GiB free; 61.92 GiB reserved in total by PyTorch) If reserved$
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857715 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857716 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857717 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857718 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857719 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857720 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 857721 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 7 (pid: 857722) of binary: /gpfswork/rech/six/commun/conda/py38-pt111/bin/python