ColossalAI
ColossalAI copied to clipboard
[BUG]: --master_addr
π Describe the bug
while i use the command: "colossalai run --nproc_per_node 1 --master_addr GPU001 --master_port 29505 --host GPU001 main.py", it's not working. but the command "colossalai run --nproc_per_node 1 --master_addr localhost --master_port 29505 --host GPU001 main.py" is ok. What kind of problem may this beοΌ
Environment
No response
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Title: [BUG]: --master_addr
Hi @bingokunkun , master_addr
is used in setting up NCCL distributed groups, thus it should be specific address. host
consissts of the addresses used to setting up SSH connections, and they can be any host name that can be resolved by SSH.