NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

conda installed environment cannot import classes/methods from megatron

Open ryanyxw opened this issue 3 months ago • 3 comments

Describe the bug

I created a conda environment and built nemo from source. However, when I type from nemo.collections import llm, I get the following error:

from nemo.collections import llm

/root/.conda/envs/nemo/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Import of quick_gelu from megatron.core.fusions.fused_bias_geglu failed with: Traceback (most recent call last):
  File "/root/brainstorm/NeMo/nemo/utils/import_utils.py", line 319, in safe_import_from
    return getattr(imported_module, symbol), True
AttributeError: module 'megatron.core.fusions.fused_bias_geglu' has no attribute 'quick_gelu'

WARNING: transformer_engine not installed. Using default recipe.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/brainstorm/NeMo/nemo/collections/llm/__init__.py", line 52, in <module>
    from nemo.collections.llm.gpt.model import (  # noqa: F401
  File "/root/brainstorm/NeMo/nemo/collections/llm/gpt/model/__init__.py", line 65, in <module>
    from nemo.collections.llm.gpt.model.hyena import (
  File "/root/brainstorm/NeMo/nemo/collections/llm/gpt/model/hyena.py", line 34, in <module>
    from nemo.collections.llm.gpt.model.megatron.hyena.hyena_model import HyenaModel as MCoreHyenaModel
  File "/root/brainstorm/NeMo/nemo/collections/llm/gpt/model/megatron/hyena/hyena_model.py", line 30, in <module>
    from megatron.core.process_groups_config import ProcessGroupCollection
ImportError: cannot import name 'ProcessGroupCollection' from 'megatron.core.process_groups_config' (/root/.conda/envs/nemo/lib/python3.10/site-packages/megatron/core/process_groups_config.py)

Steps/Code to reproduce bug

My environment setup is as follows:

conda create -n nemo python==3.10.12
pip3 install torch torchvision
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
git checkout main # checkout main branch of nemo
pip install -e '.[all]'

Expected behavior

I expected the package to import smoothly.

Environment details

I'm using a conda environment.

  • OS version - Ubuntu 22.04.5 LTS
  • PyTorch version - Stable (2.8.0)
  • Python version - 3.10.12

ryanyxw avatar Sep 17 '25 19:09 ryanyxw

Did you solve it? I'm also having the same problem.😭😭😭

Cherenkov-Pavel avatar Sep 19 '25 09:09 Cherenkov-Pavel

Not yet... I think the only thing that reliably works is docker, but I'd still prefer to use conda so ideally it'd be nice to have this fixed

ryanyxw avatar Sep 22 '25 19:09 ryanyxw

Try reinstalling Megatron-LM to get the latest updates:

pip uninstall megatron-core
pip install git+https://github.com/NVIDIA/Megatron-LM.git

zoxxxx avatar Oct 02 '25 16:10 zoxxxx