Results 123 comments of Xin Yao

I agree with you. @SSARCandy This is my implementation, any advice? ```python def coral_loss(source, target): d = source.size(1) ns, nt = source.size(0), target.size(0) # source covariance tmp_s = torch.ones((1, ns))...

@redhat12345 My code is based on `PyTorch>=0.4`, in which `torch.tensor` and `Variable` are merged together.

@redhat12345 if the `source` and `target` are cuda tensors, then `torch.ones((1, ns))` should be `torch.ones((1, ns)).cuda()`, as well as that of `nt`. I have tried with this loss and find...

> @yaox12 I saw you've closed PR4384. Is this still an issue or you will find other resolutions? Either Quan's suggestion or specifying `-DCUDA_ARCH_NAME=All` should work.

Cannot reproduce. Can you share more env information, e.g., OS, RAM, etc.?

Not yet. cc @chang-l @TristonC

How many GPUs did you use? Have you changed the args such as `--graph-device`, `--data-device`?

Can you try adding `--shm-size=64g` (large enough to store the whole graph) to your `docker run` command?

This line of code in dataloader will create a shared memory array for shuffling. https://github.com/dmlc/dgl/blob/5ba5106acab6a642e9b790e5331ee519112a5623/python/dgl/dataloading/dataloader.py#L146-L149 When `len(train_seeds)` > 8M, the shared tensor will run out of Docker's default shm size...

According to the comments, the shared tensor is used for `persistent_workers=True` (`or num_workers > 0` I think?). We can change the code to use shared tensors only when these conditions...