transformers DPO Trainer Crashes on multi-gpu setup!

System Info

Kaggle Notebook With 2X T4 GPUs. [link to Kaggle notebook:] (https://www.kaggle.com/code/augustmurr/dpo-issue-recreationl) The issue does not occur when loading the model on one GPU (for example "cuda:0"), but the trainer will only use 1 GPU which is very inefficient.

Who can help?

@muellerzr @pacman100

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

link to Kaggle notebook: https://www.kaggle.com/code/augustmurr/dpo-issue-recreation

Expected behavior

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Mar 12 '24 19:03 August-murr

This looks to be an issue with how the weights are loaded with device_map=auto rather than with trainer. Possibly to do with _no_split_modules

cc @younesbelkada @ArthurZucker

Apr 12 '24 09:04 amyeroberts

Hi @August-murr ! I agree with what @amyeroberts said, I think that you are loading the model with device_map="auto". In order to correctly perform multi-gpu training, please refer to this comment: https://github.com/huggingface/accelerate/issues/1840#issuecomment-1683105994 - let us know how it goes!

Apr 16 '24 08:04 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 10 '24 08:05 github-actions[bot]