lightning-hydra-template
lightning-hydra-template copied to clipboard
Training stuck when submitting job to slurm with multigpu and ddp
The training is stuck and I get the error
The client socket has failed to connect to [ip6-localhost]:24355 (errno: 99 - Cannot assign requested address)
Need help with this.
same question