lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Dp and mp support

Open NathanHB opened this issue 1 year ago • 7 comments

adds support for both model and data parallelism

NathanHB avatar Jul 02 '24 15:07 NathanHB

Hi @NathanHB ! I'll aim to test this out very soon. Would you be willing to update the main README.md sections with the new intended usage for parallelize=True with / versus accelerate launch?

I'm torn with respect to multi-machine usage, we'd previously decided this was out-of-scope for HF models for us--is this something you all are using regularly for leaderboard evals though?

haileyschoelkopf avatar Jul 09 '24 13:07 haileyschoelkopf

Thanks @haileyschoelkopf ! Will update the readme.

I'm torn with respect to multi-machine usage, we'd previously decided this was out-of-scope for HF models for us--is this something you all are using regularly for leaderboard evals though?

Yes we use it for every model that do not fit a GPU. We need it to be able to evaluate as fast as possible, some evals would take days without DP.

Why would it be out of scope for HF model ?

NathanHB avatar Jul 10 '24 12:07 NathanHB

Note: This mostly allows to run one model by using both data parallelism and pipeline parallelism on several GPUs of the same machine, so I'm unsure why you mention multi-machine usage. Both are at the moment available in the harness, but not at the same time for one model, which this PR allows.

clefourrier avatar Jul 10 '24 13:07 clefourrier

Maybe this is user error on my part, but this seems not to be giving the desired result:

accelerate launch --num_processes 2 --multi_gpu lm_eval --model hf --model_args pretrained=gpt2,parallelize=True --tasks lambada_openai
2024-07-11:17:40:43,032 INFO     [__main__.py:272] Verbosity set to INFO
2024-07-11:17:40:43,042 INFO     [__main__.py:272] Verbosity set to INFO
2024-07-11:17:40:47,431 INFO     [__main__.py:369] Selected Tasks: ['lambada_openai']
2024-07-11:17:40:47,436 INFO     [evaluator.py:152] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-07-11:17:40:47,436 INFO     [evaluator.py:189] Initializing hf model, with arguments: {'pretrained': 'gpt2', 'parallelize': True}
2024-07-11:17:40:47,461 INFO     [__main__.py:369] Selected Tasks: ['lambada_openai']
2024-07-11:17:40:47,464 INFO     [evaluator.py:152] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-07-11:17:40:47,465 INFO     [evaluator.py:189] Initializing hf model, with arguments: {'pretrained': 'gpt2', 'parallelize': True}
2024-07-11:17:40:48,299 INFO     [huggingface.py:372] Model parallel was set to True, setting max memory per GPU to {1: 50485133312, 3: 50485133312} and device map to 'auto'
2024-07-11:17:40:48,301 INFO     [huggingface.py:372] Model parallel was set to True, setting max memory per GPU to {0: 50762940416, 2: 50487230464} and device map to 'auto'
2024-07-11:17:40:49,395 WARNING  [huggingface.py:270] You are both using a HF Accelerate `device_map` and launching via `accelerate launch`. This will attempt to do model and data parallelism depending on the resources available.
2024-07-11:17:40:49,501 WARNING  [huggingface.py:270] You are both using a HF Accelerate `device_map` and launching via `accelerate launch`. This will attempt to do model and data parallelism depending on the resources available.
2024-07-11:17:40:50,623 WARNING  [task.py:325] [Task: lambada_openai] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-07-11:17:40:50,623 WARNING  [task.py:325] [Task: lambada_openai] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-07-11:17:40:50,674 INFO     [evaluator.py:261] Setting fewshot random generator seed to 1234
2024-07-11:17:40:50,675 INFO     [task.py:411] Building contexts for lambada_openai on rank 0...
 11%|██████                                                 | 573/5153 [00:00<00:05, 808.93it/s]2024-07-11:17:40:51,453 WARNING  [task.py:325] [Task: lambada_openai] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-07-11:17:40:51,453 WARNING  [task.py:325] [Task: lambada_openai] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-07-11:17:40:51,504 INFO     [evaluator.py:261] Setting fewshot random generator seed to 1234
2024-07-11:17:40:51,504 INFO     [task.py:411] Building contexts for lambada_openai on rank 0...
100%|██████████████████████████████████████████████████████| 5153/5153 [00:06<00:00, 825.19it/s]
2024-07-11:17:40:56,971 INFO     [evaluator.py:438] Running loglikelihood requests
100%|██████████████████████████████████████████████████████| 5153/5153 [00:06<00:00, 820.96it/s]
2024-07-11:17:40:57,831 INFO     [evaluator.py:438] Running loglikelihood requests
Running loglikelihood requests: 100%|██████████████████████| 5153/5153 [00:38<00:00, 133.56it/s]
Running loglikelihood requests:  99%|█████████████████████▉| 5127/5153 [00:39<00:00, 137.75it/s]bootstrapping for stddev: perplexity
Running loglikelihood requests: 100%|██████████████████████| 5153/5153 [00:39<00:00, 130.84it/s]
100%|█████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 88.52it/s]
bootstrapping for stddev: perplexity
100%|█████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 88.34it/s]
2024-07-11:17:41:43,189 INFO     [evaluation_tracker.py:240] Output path not provided, skipping saving results aggregated
hf (pretrained=gpt2,parallelize=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|lambada_openai|      1|none  |     0|acc       |↑  | 0.3256|±  |0.0065|
|              |       |none  |     0|perplexity|↓  |40.0554|±  |1.4787|

2024-07-11:17:41:45,522 INFO     [evaluation_tracker.py:240] Output path not provided, skipping saving results aggregated
hf (pretrained=gpt2,parallelize=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|lambada_openai|      1|none  |     0|acc       |↑  | 0.3256|±  |0.0065|
|              |       |none  |     0|perplexity|↓  |40.0554|±  |1.4787|

this is what I get when trying to run on 4 gpus with --num_processes 2:

  • results print twice. This seems to imply that both processes think that they are rank 0 and the only process
  • It seems like the conditional at L269 implies we fall through all the other conditions there, meaning rank and world size never get set to something different than the default Rank=0, Worldsize=1
  • Request counts aren't split in half, also indicating that lm.world_size and lm.rank aren't correctly set. We should see half the requests run on each process, and then see only a single rank report back/save results + print tables.

haileyschoelkopf avatar Jul 11 '24 17:07 haileyschoelkopf

@haileyschoelkopf I added a fix for the duplication of logs - can you tell me if it's better on your side?

clefourrier avatar Jul 15 '24 10:07 clefourrier

Hi @haileyschoelkopf ! the linters test is not passing because of files I did not modify. I merged main and ran the linter again modifiying a few more files than necessery in the PR

NathanHB avatar Jul 15 '24 11:07 NathanHB

Hi @NathanHB , #2104 should fix the linters!

haileyschoelkopf avatar Jul 15 '24 15:07 haileyschoelkopf

Thanks a lot for the fix, merged it to the PR!

clefourrier avatar Aug 05 '24 06:08 clefourrier