lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

test_transformer_engine_executor.py is not runnable in the PyTorch build without distributed support

Open IvanYashchuk opened this issue 11 months ago • 1 comments

🐛 Bug

  1. Modify is_available() function to return False in torch/distributed/__init__.py
  2. Modify is_available() function to return False in torch/distributed/rpc/__init__.py
  3. Run pytest thunder/tests/test_transformer_engine_executor.py --collect-only
=============================================================================================== ERRORS ===============================================================================================
_________________________________________________________________ ERROR collecting thunder/tests/test_transformer_engine_executor.py _________________________________________________________________
ImportError while importing test module '/opt/pytorch/lightning-thunder/thunder/tests/test_transformer_engine_executor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
thunder/tests/test_transformer_engine_executor.py:13: in <module>
    import transformer_engine.pytorch as te
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/__init__.py:64: in <module>
    from transformer_engine.pytorch.module import LayerNormLinear
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/__init__.py:6: in <module>
    from .layernorm_linear import LayerNormLinear
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py:15: in <module>
    from .base import (
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/base.py:28: in <module>
    from ..distributed import (
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/distributed.py:16: in <module>
    from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/__init__.py:1: in <module>
    from ._flat_param import FlatParameter as FlatParameter
/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_flat_param.py:45: in <module>
    from torch.testing._internal.distributed.fake_pg import FakeProcessGroup
/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/distributed/fake_pg.py:5: in <module>
    from torch._C._distributed_c10d import (
E   ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package
_________________________________________________________________ ERROR collecting thunder/tests/test_transformer_engine_executor.py _________________________________________________________________
ImportError while importing test module '/opt/pytorch/lightning-thunder/thunder/tests/test_transformer_engine_executor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
thunder/tests/test_transformer_engine_executor.py:13: in <module>
    import transformer_engine.pytorch as te
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/__init__.py:64: in <module>
    from transformer_engine.pytorch.module import LayerNormLinear
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/__init__.py:6: in <module>
    from .layernorm_linear import LayerNormLinear
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py:15: in <module>
    from .base import (
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/module/base.py:28: in <module>
    from ..distributed import (
/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/distributed.py:16: in <module>
    from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/__init__.py:1: in <module>
    from ._flat_param import FlatParameter as FlatParameter
/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_flat_param.py:45: in <module>
    from torch.testing._internal.distributed.fake_pg import FakeProcessGroup
/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/distributed/fake_pg.py:5: in <module>
    from torch._C._distributed_c10d import (
E   ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package
====================================================================================== short test summary info =======================================================================================
ERROR thunder/tests/test_transformer_engine_executor.py
ERROR thunder/tests/test_transformer_engine_executor.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================== no tests collected, 2 errors in 1.46s ================================================================================

cc @borda

IvanYashchuk avatar Oct 28 '24 10:10 IvanYashchuk