lm-evaluation-harness
lm-evaluation-harness copied to clipboard
`parallelize=True` does not work on some tasks.
Python 3.10 lm-evaluation-harness b8d1cef915ff82ec5d5c27a39c6c2b58a9171fac
I have a multi GPU setup which I was trying to use to load parts of a model to benchmark them. During testing I came across this issue where running CoQA didn't parallelize the model while something like PiQA did.
PiQA:
python main.py \
--model hf-auto \
--model_args pretrained=meta-llama/Llama-2-7b-hf,parallelize=True \
--tasks piqa \
--batch_size auto:2
CoQA:
!python main.py \
--model hf-auto \
--model_args pretrained=meta-llama/Llama-2-7b-hf,parallelize=True \
--tasks coqa \
--batch_size auto:2
While the PiQA task executes successfully, CoQA errors out due to insufficient GPU memory:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.99 GiB (GPU 0; 14.75 GiB total capacity; 13.45 GiB already allocated; 1.06 GiB free; 13.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It is very weird to me that you might have this issue in a dataset-dependent way. Can you check if it's still a problem on the big-refactor branch?
Closing as abandoned.