Results 123 comments of Xin Yao

After some investigation, I think this issue is caused by IOMMU enabled in the OS. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#iommu-on-linux for more details. To verify it, you can run the following code to...

The code doesn't disable IOMMU. I just want to check if the problem is caused by DGL or not. This PyTorch code calls the same underlying CUDA API as DGL...

Cannot reproduce... I ran with `python train_sampling_multi_gpu.py --gpu 0,1 --dataset ogbn-papers100M --data-device uva` and it works well. Can you change the following two lines ```python train_nfeat = dgl.contrib.UnifiedTensor(train_nfeat, device=device) train_labels...

Do you have ideas on this issue? @nv-dlasalle @davidmin7

Renaming `DLContext` -> `DLDevice` happens in DLPack 0.4, and was adopted by PyTorch 1.9 (the minimal version we are supporting). So I believe supporting `->device` is enough. This is also...

I notice that `cuh` was removed in commit [#b647be2](https://github.com/dmlc/dmlc-core/commit/b647be2dee985d77a12e8e41bc27382221938290). Any reasons? @piiswrong

I have tried `opt_level=O1` and `O2`, they gave very close results but didn't show much speedup in training time. I guess it is due to the communication bottleneck.

This should be a mistake. `(0.1, 2.0)` is correct.

@SunDoge Also to notice that the `HueSaturationValue` in `byol_transform_a.py` may not be consistent with the torchvision version. See https://github.com/albumentations-team/albumentations/issues/672 and https://github.com/albumentations-team/albumentations/issues/698 for details.

@guanfuchen First, training BYOL is slow naturally, due to: 1. With the same architecture, ResNet-50 for example, BYOL does more than twice forward calculations in comparison to supervised learning. 2....