dgl icon indicating copy to clipboard operation
dgl copied to clipboard

[Bug] ModuleNotFoundError: No module named 'torchdata.datapipes' on Linux aarch64 (DGL incompatibility)

Open rudgmleo opened this issue 5 months ago • 8 comments

❓ Questions and Help

Before proceeding, please note that we recommend using our discussion forum (https://discuss.dgl.ai) for general questions. As a result, this issue will likely be CLOSED shortly.

Dear RFdiffusion / DGL Team,

I am writing to report a persistent issue I'm encountering while trying to set up RFdiffusion on a Linux aarch64 (ARM64) system. I would greatly appreciate any guidance or known solutions.

Problem Description: When attempting to import RFdiffusion's core modules, I consistently get a ModuleNotFoundError: No module named 'torchdata.datapipes'. This error originates within the dgl library, specifically when it tries to import the datapipes submodule from torchdata.

Environment:

  • Operating System: Linux aarch64 (Ubuntu 24.04.1 LTS)
  • Python Version: 3.9 (Conda environment)
  • PyTorch Version: 2.0.0 (CPU-only build from conda-forge)
  • TorchVision Version: 0.15.2
  • TorchAudio Version: 2.0.0 (installed via pip)
  • CUDA Toolkit: 11.8 (Installed via Conda, but PyTorch build is CPU)
  • DGL Version:
    • conda install attempts for versions 0.8.1, 0.9.1, 1.0.0, 1.1.2 (+cu118) resulted in PackagesNotFoundError (Conda could not find/resolve them).
    • pip install dgl (latest, 2.1.0) successfully installs DGL.
  • TorchData Version: 0.11.0 (installed via pip)
  • RFdiffusion Source: https://github.com/RosettaCommons/RFdiffusion
  • ColabDesign Source: https://github.com/sokrypton/ColabDesign

Steps to Reproduce:

  1. Switched to a Python 3.9 environment.
    • Created a base environment using conda create -n SE3nv python=3.9.
    • Installed PyTorch ecosystem (pytorch==2.0.0, torchvision==0.15.2, cudatoolkit=11.8) from conda-forge. (This step succeeded.)
    • Installed DGL latest version (2.1.0) via pip install dgl. (This step succeeded.)
    • Installed TorchAudio 2.0.0 via pip install torchaudio==2.0.0. (This step succeeded.)
    • Installed RFdiffusion/env/SE3Transformer and ColabDesign (pip install -e .).
    • Installed remaining pip dependencies (omegaconf, icecream, pyrsistent, matplotlib, ipywidgets, py3Dmol, jupyterlab, etc.).
  2. Attempted python -c "from inference.utils import parse_pdb".

Expected Behavior: RFdiffusion modules should import successfully.

Actual Behavior (Traceback): (Please copy and paste the full traceback from your most recent ModuleNotFoundError: No module named 'torchdata.datapipes' here. For example:) Traceback (most recent call last): File "

Additional Context (What I've Tried):

  • Attempted installation in Python 3.10 environment, but failed due to DGL, TorchData, and TorchAudio version conflicts.
  • For Python 3.9, conda install attempts for DGL versions (0.8.1, 0.9.1, 1.0.0, 1.1.2 with cu118 label) from dglteam and defaults channels consistently failed with PackagesNotFoundError. This indicates Conda cannot find or resolve these aarch64 builds.
  • Manually checked ~/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torchdata/ via ls -l and confirmed that the datapipes directory is physically missing. This strongly suggests a structural mismatch between DGL 2.1.0's requirements and the available TorchData 0.11.0 build for aarch64.

Question: Are there any known working dgl / torchdata version combinations or specific installation instructions for linux-aarch64 that successfully resolve this torchdata.datapipes issue? Are there any official aarch64 Docker images or specific Dockerfile modifications known to work?

Thank you for your time and assistance.

rudgmleo avatar Jul 22 '25 09:07 rudgmleo

Hey @rudgmleo . I was having the same module not found error for the torchdata.datapipes.

What worked out for me in the current venv and system specs setup was following the installation instructions here: https://www.dgl.ai/pages/start.html


My specs:

venv: Python 3.12.3

pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html

System:

  • Kernel: 6.8.0-51-generic arch: x86_64 bits: 64
  • Desktop: Cinnamon v: 6.4.6 Distro: Linux Mint 22.1 Xia

CPU:

  • Info: 6-core model: Intel Core i7-9750H bits: 64 type: MT MCP cache: L2: 1.5 MiB
  • Speed (MHz): avg: 800 min/max: 800/4500 cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 801 12: 800

Graphics:

  • Device-1: NVIDIA TU117M [GeForce GTX 1650 Mobile / Max-Q] driver: nvidia v: 575.64.03
  • API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 575.64.03 renderer: NVIDIA GeForce GTX 1650/PCIe/SSE2

You can adjust your specs accordingly:

Image

Gil32610 avatar Jul 23 '25 20:07 Gil32610

Hello @[Gil32610]

Thank you for your response and for sharing your solution! It's very helpful to know that you encountered the exact same ModuleNotFoundError: No module named 'torchdata.datapipes' and managed to resolve it. This confirms the root cause of the issue. However, after reviewing your specs and solution, I believe the specific fix you used is unfortunately not directly applicable to my linux-aarch64 (ARM64) system.

Reason for non-applicability: Your dgl installation command points to a cu124 (CUDA 12.4) build of DGL (https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html). This build is specifically compiled for x86_64 architectures. My server is aarch64, which means these x86_64 specific builds are incompatible.

Upon inspecting https://data.dgl.ai/wheels/torch-2.4/cpu/ for aarch64 CPU builds, the available dgl version is 2.1.0 (e.g., dgl-2.1.0-cp39-cp39-manylinux2014_aarch64.whl). As I detailed in my original issue, dgl-2.1.0 (when installed on aarch64) still results in the ModuleNotFoundError: No module named 'torchdata.datapipes', because the torchdata==0.11.0 package for aarch64 physically lacks the datapipes module that dgl 2.1.0 requires.

Therefore, the core challenge of finding a compatible dgl / torchdata build for linux-aarch64 that resolves the torchdata.datapipes issue remains. Do you or anyone else have insights into a working dgl / torchdata combination specifically for aarch64?

Thank you for your understanding and assistance.

schoyeon avatar Jul 24 '25 01:07 schoyeon

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Aug 24 '25 01:08 github-actions[bot]

I had the same issue with DGL for aarch64, what i ended up doing was building dgl 2.4.0 from source for aarch64 (it was a bit painful).

It looks like dgl 2.4.0 does not depend on torchdata. If you are referring to RFD2 let me know. I installed torch 2.4.0 for aarch64 (CUDA 12.4 as well), so finally i have RFD2 running on this architecture and GPU

ramithuh avatar Sep 09 '25 16:09 ramithuh

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Oct 11 '25 01:10 github-actions[bot]

setting torchdata==0.7.1 is what finally fixed this stuff for me

sedfanne avatar Oct 14 '25 19:10 sedfanne

I have the same problem, with python 3.10 cuda 12.1 pytorch 2.3.1 and torchdata 0.11.0

lll123github avatar Nov 13 '25 14:11 lll123github

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Dec 15 '25 01:12 github-actions[bot]