ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base'

Open CreamyLong opened this issue 2 years ago • 1 comments

🐛 Describe the bug

When I run the code from https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet, I got errors

AttributeError: module 'torch.distributed' has no attribute 'reduce_scatter_base

then I annotated the code in colossalai/communication/collective.py guided by online

_all_gather_func = dist._all_gather_base 
    if "all_gather_into_tensor" not in dir(dist) else dist.all_gather_into_tensor
_reduce_scatter_func = dist._reduce_scatter_base
    if "reduce_scatter_tensor" not in dir(dist) else dist.reduce_scatter_tensor

got the error

ModuleNotFoundError: No module named 'torch.fx._compatibility'

Environment

python 3.6 torch 1.9.1+cu102 gtx3090

CreamyLong avatar Feb 13 '23 03:02 CreamyLong

Thank you for the notification. It is supposed to be supported by torch>=1.10. We will improve the compatibility as soon as possible.

kurisusnowdeng avatar Feb 13 '23 03:02 kurisusnowdeng

Hi @CreamyLong Please check the env reqiurement. https://github.com/hpcaitech/ColossalAI#installation This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 18 '23 09:04 binmakeswell