fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Hydra can not recognize "--local_rank=0" argument

Open flycser opened this issue 3 years ago • 3 comments

🐛 Bug

When I ran a model in a distributed model(2 nodes, each node with 2 GPUs) via hydra_train.py. the hydra can not accept arguments starting with "--", while torch.distributed.launch pass a argument as "--local_rank=0", which will raise a "unrecognized argument" error. I ran the model via command line, it works because the argparser can recognize arguments starting with "--"

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd 'python -m torch.distributed.launch --nproc-per-node 2 --nnodes 2 --node_rank 0 --master_addr 'xxxxxx" --master_port 12345 fairseq_cli/hydra_train.py xxxxxxxxxxxxxx'
  2. See error: "Unrecognized argument --local_rank=0" "Unrecognized argument --local_rank=1"

Code sample

Expected behavior

Environment

  • fairseq Version: 0.10.2
  • PyTorch Version: 1.9.0
  • OS: Linux
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version: 3.7.0
  • CUDA/cuDNN version: 10.2/7.6.5
  • GPU models and configuration: Tesla P40
  • Any other relevant information:

Additional context

flycser avatar Jan 29 '22 06:01 flycser

Hi, have you solved this problem yet? I met a same one.

Dawn-970 avatar Mar 14 '22 03:03 Dawn-970

Update the torch version and try this command : torchrun --nproc-per-node 2 --nnodes 2 --node_rank 0 --master_addr 'xxxxxx" --master_port 12345 fairseq_cli/hydra_train.py xxxxxxxxxxxxxx

Dawn-970 avatar Mar 14 '22 06:03 Dawn-970

@Dawn-970 Hi, I have met with same issue, can it be solved?

Rongjiehuang avatar Sep 13 '22 11:09 Rongjiehuang

Its so difficult!

xiyewang2 avatar Sep 14 '22 09:09 xiyewang2

@Dawn-970 Hi, I have met with same issue, can it be solved?

use pytorch -m torch.distributed.lauch --use_env and set cfg.distributed_training.device_id by os.environ['LOCAL_RANK']

flycser avatar Sep 15 '22 03:09 flycser