Albert Zeyer comments

Results 938 comments of


                                            Albert Zeyer

Show modified save icon if URL/DOI exists in library

I also would like to have this. I was really confused that this is not yet supported and thought that I must have done sth wrong. I really wonder why...

PT: No backend type associated with device type cpu

What PyTorch version? It's correct that `_has_data` is on CPU. But there should also be a distributed backend for it? I thought the way that we init PyTorch distributed is...

PT: No backend type associated with device type cpu

Did you set `backend`? What are your options? What happens if you set `backend` to `"cpu:gloo,cuda:nccl"` or sth like that? Maybe the `init_process_group` behavior changed in PyTorch 2.6.

PT: No backend type associated with device type cpu

@NeoLegends so you never fixed this yet? > If I set `backend = "cpu:gloo,cuda:nccl"` I meant within the distrib options.

PT: No backend type associated with device type cpu

Why do you close this? This is not closed, unless it is fixed.

PT: No backend type associated with device type cpu

> I tested your suggestion(backend = "cpu:gloo,cuda:nccl"), there is a bug in torch/distributed.py but I'm now able to run it after fixing the bug. I will make a PR for...

PT: No backend type associated with device type cpu

> But as Moritz said, it would be better to set that as a default. Yes at least that. That's maybe not enough. The behavior changed here in some PyTorch...

PT: No backend type associated with device type cpu

One way for example, I'm not sure if this makes sense or is easy to do: After torch distribute init, check whether there is a backend type associated with device...

PT: No backend type associated with device type cpu

> I assume we always used default None to start both nccl and gloo. That was exactly my question. Is this the case? Maybe ask around whether someone has used...

multiprocessing: OSError: AF_UNIX path too long

(The torch.distributed output is somehow messed up. Do you have a version which is not messed up?)