CodeT5 Scalar issue: Data Parallel with 2 core GPU

Dear Team,

I tried to train the model with 2 core GPU as 0,1 I faced the following problem, which i have not faced with 1 core GPU. Could you please help me to solve the issue.

Environment: Kaggle Accelerator: GPU T4 x 2

/opt/conda/lib/python3.7/site-packages/transformers/optimization.py:395: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning FutureWarning, Training: 0%| | 0/3125 [00:00<?, ?it/s]/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' [0] Train loss 0.258: 100%|██████████| 3125/3125 [29:17<00:00, 1.78it/s] 100%|██████████| 2000/2000 [00:07<00:00, 273.69it/s] Eval ppl: 0%| | 0/63 [00:00<?, ?it/s] Traceback (most recent call last): File "/kaggle/working/CodeT5/run_gen.py", line 387, in main() File "/kaggle/working/CodeT5/run_gen.py", line 265, in main eval_ppl = eval_ppl_epoch(args, eval_data, eval_examples, model, tokenizer) File "/kaggle/working/CodeT5/run_gen.py", line 75, in eval_ppl_epoch eval_loss += loss.item() ValueError: only one element tensors can be converted to Python scalars

Apr 17 '23 04:04 eswarthammana

I faced a similar issue. I added a condition like below in run_gen.py (line 75):

outputs = model(input_ids=source_ids, attention_mask=source_mask,
                labels=target_ids, decoder_attention_mask=target_mask)
loss = outputs.loss
if args.n_gpu > 1:
    loss = loss.mean()

It now works for me.

May 04 '23 21:05 alibrahimzada

Hi, I'm unable to finetune with multiple GPUs. Can @eswarthammana or @alibrahimzada tell me about any modifications required to the scripts for this?

Tx

May 30 '23 19:05 Sleepyhead01

make sure you execute your script with torchrun rather than python3/python. I don't think there are other requirements for multi-GPU execution.

May 30 '23 23:05 alibrahimzada

Hi @Sleepyhead01,

the one i tried is with in exp_with_args.sh at the end of the file CUDA_VISIBLE_DEVICES=${GPU} modify the ${GPU} value as 0, 1 through code it accepts only integer we cannot pass more than one value.

As @alibrahimzada mentioned modify the loss as loss.mean()

May 31 '23 04:05 eswarthammana

Training with multiple GPUs starts with this modification. However, eval_bleu_epoch gives the following error:

Traceback (most recent call last):
  File "CodeT5/run_gen.py", line 392, in <module>
    main()
  File "CodeT5/run_gen.py", line 319, in main
    result = eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, 'dev', 'e%d' % cur_epoch)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CodeT5/run_gen.py", line 109, in eval_bleu_epoch
    preds = model.generate(source_ids,
            ^^^^^^^^^^^^^^
  File "anaconda3/envs/Old_R/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'generate'

Any fix for this? Tx

May 31 '23 18:05 Sleepyhead01

@Sleepyhead01 you need to do model.module.generate() because for n_gpu > 1... model is an attribute of DataParallel. To get the model, you should call .module on it.

Unfortunately the authors have not maintained these scripts with newer versions of torch.

May 31 '23 18:05 alibrahimzada

CodeT5 CodeT5 copied to clipboard

Scalar issue: Data Parallel with 2 core GPU

CodeT5
CodeT5 copied to clipboard