dgl icon indicating copy to clipboard operation
dgl copied to clipboard

GSpMM for pytorch causes autocast warning for cpu-only versions of torch

Open nv-dlasalle opened this issue 3 years ago • 2 comments

🐛 Bug

See PyTorch issue (thanks to @as51340 for raising this): https://github.com/pytorch/pytorch/issues/83733

And reproducible code here: https://gist.github.com/as51340/69691482e2a204e2a8e12c07954e0553

From what I can tell the issue is caused by the cast in the decorator for forward of GSpMM: https://github.com/dmlc/dgl/blob/master/python/dgl/backend/pytorch/sparse.py#L104

Environment

  • DGL Version (e.g., 1.0): 0.9.0 and 0.9.1 (cuda and non-cuda versions)
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.12.1 (cpu-only)
  • OS (e.g., Linux): linux
  • How you installed DGL (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.8
  • CUDA/cuDNN version (if applicable): N/A
  • GPU models and configuration (e.g. V100): N/A

nv-dlasalle avatar Sep 19 '22 18:09 nv-dlasalle

@ptrblck do you know if the decorator @custom_fwd(cast_inputs=th.float16) is necessary to make a PyTorch module work with autocast/fp16? Or should that automatically get handled when a user uses with autocast(enabled=True) and GradScaler for pytorch modules?

We could potentially fix this by check if cuda is available in torch before using decorators from torch.cuda.amp, but doesn't seem like the right way to address it.

nv-dlasalle avatar Sep 19 '22 18:09 nv-dlasalle

The decorator should be necessary for custom autograd modules.

One alternative might be _cast_if_autocast_enabled from Apex. An example is here. It's also helpful to make custom autograd modules work with both fp16 and bf16.

yaox12 avatar Sep 20 '22 02:09 yaox12

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Oct 24 '22 01:10 github-actions[bot]