ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: --master_addr

Open bingokunkun opened this issue 1 year ago β€’ 2 comments

πŸ› Describe the bug

while i use the command: "colossalai run --nproc_per_node 1 --master_addr GPU001 --master_port 29505 --host GPU001 main.py", it's not working. but the command "colossalai run --nproc_per_node 1 --master_addr localhost --master_port 29505 --host GPU001 main.py" is ok. What kind of problem may this be?

Environment

No response

bingokunkun avatar Apr 19 '23 09:04 bingokunkun

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [BUG]: --master_addr

Issues-translate-bot avatar Apr 19 '23 09:04 Issues-translate-bot

Hi @bingokunkun , master_addr is used in setting up NCCL distributed groups, thus it should be specific address. host consissts of the addresses used to setting up SSH connections, and they can be any host name that can be resolved by SSH.

kurisusnowdeng avatar Apr 21 '23 03:04 kurisusnowdeng