ViLT
ViLT copied to clipboard
python run.py with data_root=content/datasets num_gpus=2 num_nodes=1 task_mlm_itm whole_word_masking=True step100k per_gpu_batchsize=64
i encounter this when i pre-train with coco: WARNING - ViLT - No observers have been added to this run INFO - ViLT - Running command 'main' INFO - ViLT - Started Global seed set to 0 INFO - lightning - Global seed set to 0 INFO - timm.models.helpers - Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p32_384-830016f5.pth) GPU available: True, used: True INFO - lightning - GPU available: True, used: True TPU available: None, using: 0 TPU cores INFO - lightning - TPU available: None, using: 0 TPU cores Using environment variable NODE_RANK for node rank (). INFO - lightning - Using environment variable NODE_RANK for node rank (). ERROR - ViLT - Failed after 0:00:06! Traceback (most recent calls WITHOUT Sacred internals): File "run.py", line 67, in main val_check_interval=_config["val_check_interval"], File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars return fn(self, **kwargs) File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 359, in init deterministic, File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 127, in on_trainer_init self.trainer.node_rank = self.determine_ddp_node_rank() File "/data/fyuan/anaconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator_connector.py", line 415, in determine_ddp_node_rank return int(rank) ValueError: invalid literal for int() with base 10: ''
Have you solved it? I have the same bug.
Have you solved it? I have the same bug.
Not yet bro.
I solved it bro! Don't forget to set these variables! export MASTER_ADDR=$DIST_0_IP export MASTER_PORT=$DIST_0_PORT export NODE_RANK=$DIST_RANK
I solved it bro! Don't forget to set these variables! export MASTER_ADDR=$DIST_0_IP export MASTER_PORT=$DIST_0_PORT export NODE_RANK=$DIST_RANK
it works, thanks!
Nice job! :)
I solved it bro! Don't forget to set these variables! export MASTER_ADDR=$DIST_0_IP export MASTER_PORT=$DIST_0_PORT export NODE_RANK=$DIST_RANK
if use one machine and 8 GPUS, how to set these variables?