lm-evaluation-harness `parallelize=True` does not work on some tasks.

Python 3.10 lm-evaluation-harness b8d1cef915ff82ec5d5c27a39c6c2b58a9171fac

I have a multi GPU setup which I was trying to use to load parts of a model to benchmark them. During testing I came across this issue where running CoQA didn't parallelize the model while something like PiQA did.

PiQA:

python main.py \
    --model hf-auto \
    --model_args pretrained=meta-llama/Llama-2-7b-hf,parallelize=True \
    --tasks piqa \
    --batch_size auto:2

CoQA:

!python main.py \
    --model hf-auto \
    --model_args pretrained=meta-llama/Llama-2-7b-hf,parallelize=True \
    --tasks coqa \
    --batch_size auto:2

While the PiQA task executes successfully, CoQA errors out due to insufficient GPU memory:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.99 GiB (GPU 0; 14.75 GiB total capacity; 13.45 GiB already allocated; 1.06 GiB free; 13.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sep 12 '23 14:09 Saatvik-droid

It is very weird to me that you might have this issue in a dataset-dependent way. Can you check if it's still a problem on the big-refactor branch?

Nov 29 '23 17:11 StellaAthena

Closing as abandoned.

Feb 07 '24 04:02 StellaAthena