GSpMM for pytorch causes autocast warning for cpu-only versions of torch
🐛 Bug
See PyTorch issue (thanks to @as51340 for raising this): https://github.com/pytorch/pytorch/issues/83733
And reproducible code here: https://gist.github.com/as51340/69691482e2a204e2a8e12c07954e0553
From what I can tell the issue is caused by the cast in the decorator for forward of GSpMM: https://github.com/dmlc/dgl/blob/master/python/dgl/backend/pytorch/sparse.py#L104
Environment
- DGL Version (e.g., 1.0): 0.9.0 and 0.9.1 (cuda and non-cuda versions)
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.12.1 (cpu-only)
- OS (e.g., Linux): linux
- How you installed DGL (
conda,pip, source): pip - Build command you used (if compiling from source):
- Python version: 3.8
- CUDA/cuDNN version (if applicable): N/A
- GPU models and configuration (e.g. V100): N/A
@ptrblck do you know if the decorator @custom_fwd(cast_inputs=th.float16) is necessary to make a PyTorch module work with autocast/fp16? Or should that automatically get handled when a user uses with autocast(enabled=True) and GradScaler for pytorch modules?
We could potentially fix this by check if cuda is available in torch before using decorators from torch.cuda.amp, but doesn't seem like the right way to address it.
The decorator should be necessary for custom autograd modules.
One alternative might be _cast_if_autocast_enabled from Apex. An example is here. It's also helpful to make custom autograd modules work with both fp16 and bf16.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you