Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Error in Installation due to circular import
I am trying to run tests on the codebase. I am using a docker image on an AWS p3.2xlarge (Tesla V100)
docker pull nvcr.io/nvidia/pytorch:21.10-py3
running python -m pip install -e .
returns the error
ERROR: Command errored out with exit status 1:
command: /workspace/Megatron-DeepSpeed/megatron/bin/python /workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpw_grs406
cwd: /workspace/Megatron-DeepSpeed
Complete output (20 lines):
Traceback (most recent call last):
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 280, in <module>
main()
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-ijyc1r97/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 154, in get_requires_for_build_wheel
return self._get_build_requires(
File "/tmp/pip-build-env-ijyc1r97/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 135, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-ijyc1r97/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 258, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-ijyc1r97/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 150, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 25, in <module>
from megatron.package_info import (
File "/workspace/Megatron-DeepSpeed/megatron/__init__.py", line 15, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
----------------------------------------
Upon further investigations, I cd
to the megatron
directory and tried import torch
but got this error
Python 3.8.12 | packaged by conda-forge | (default, Sep 29 2021, 19:52:28)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/__init__.py", line 643, in <module>
from .functional import * # noqa: F403
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/functional.py", line 6, in <module>
import torch.nn.functional as F
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/nn/__init__.py", line 1, in <module>
from .modules import * # noqa: F403
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/nn/modules/__init__.py", line 2, in <module>
from .linear import Identity, Linear, Bilinear, LazyLinear
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 6, in <module>
from .. import functional as F
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/nn/functional.py", line 11, in <module>
from .._jit_internal import boolean_dispatch, _overload, BroadcastingList1, BroadcastingList2, BroadcastingList3
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/_jit_internal.py", line 24, in <module>
import torch.distributed.rpc
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/distributed/__init__.py", line 52, in <module>
from .distributed_c10d import * # noqa: F403
File "/workspace/Megatron-DeepSpeed/megatron/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3, in <module>
import logging
File "/workspace/Megatron-DeepSpeed/megatron/logging.py", line 22, in <module>
from logging import CRITICAL # NOQA
ImportError: cannot import name 'CRITICAL' from partially initialized module 'logging' (most likely due to a circular import) (/workspace/Megatron-DeepSpeed/megatron/logging.py)
It seems that the logging.py
is affecting torch
?
I was able to reproduce on a clean env.
It's weird, I've fixed the second point by renaming the file to logging_utils.py
instead. However that didn't fix it. So I think the issues are unrelated.
Running python setup.py install
worked ... I'm not sure why one is failing and not the other one ...
So I tried tackling this again. Turns out pip install -e . --no-use-pep517
works. I'm still unclear why that is.
Yeah, there are multiple issues with circular imports in megatron-lm. We fixed some but more are popping up.
And I don't think it was designed for being installed. Remember it has to compile cuda kernels which are in the source. Deepspeed has a way to do to that works well, but not Megatron.
Turns out pip install -e . --no-use-pep517 works. I'm still unclear why that is.
can this somehow be enabled automatically in setup.py? That's too hard to remember