dmcp icon indicating copy to clipboard operation
dmcp copied to clipboard

os.environ['RANK'] keyError: 'RANK'

Open Margrate opened this issue 4 years ago • 2 comments

I run python main.py --mode train --data data1/ImageNetOrigin --config config/mbv2/retrain.yaml
--flops 43 --chcfg ./results/DMCPMobileNetV2_43_MMDDHH/model_sample/expected_ch Traceback (most recent call last): File "main.py", line 75, in main() File "main.py", line 42, in main tools.init(config) File "/data1/task/tools/dmcp/utils/tools.py", line 28, in init dist.init_dist(config.distributed.enable) File "/data1/task/tools/dmcp/utils/distributed.py", line 29, in init_dist rank = int(os.environ['RANK']) File "/usr/local/miniconda3/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'RANK'

Margrate avatar Sep 02 '21 09:09 Margrate

Turn off distributed training

Muke6 avatar May 10 '22 10:05 Muke6

I have a machine with 8 GPUs and 24 CPUs. Why doesn't this key exists in os.environ?

$ python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import os
>>> os.environ["RANK"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda/envs/azureml_py38/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'RANK'

I specifically want to do distributed training.

monajalal avatar Dec 13 '22 15:12 monajalal