[Bug] ModuleNotFoundError: No module named 'torchdata.datapipes' on Linux aarch64 (DGL incompatibility)
❓ Questions and Help
Before proceeding, please note that we recommend using our discussion forum (https://discuss.dgl.ai) for general questions. As a result, this issue will likely be CLOSED shortly.
Dear RFdiffusion / DGL Team,
I am writing to report a persistent issue I'm encountering while trying to set up RFdiffusion on a Linux aarch64 (ARM64) system. I would greatly appreciate any guidance or known solutions.
Problem Description:
When attempting to import RFdiffusion's core modules, I consistently get a ModuleNotFoundError: No module named 'torchdata.datapipes'. This error originates within the dgl library, specifically when it tries to import the datapipes submodule from torchdata.
Environment:
- Operating System: Linux aarch64 (Ubuntu 24.04.1 LTS)
- Python Version: 3.9 (Conda environment)
- PyTorch Version: 2.0.0 (CPU-only build from
conda-forge) - TorchVision Version: 0.15.2
- TorchAudio Version: 2.0.0 (installed via pip)
- CUDA Toolkit: 11.8 (Installed via Conda, but PyTorch build is CPU)
- DGL Version:
conda installattempts for versions 0.8.1, 0.9.1, 1.0.0, 1.1.2 (+cu118) resulted inPackagesNotFoundError(Conda could not find/resolve them).pip install dgl(latest, 2.1.0) successfully installs DGL.
- TorchData Version: 0.11.0 (installed via pip)
- RFdiffusion Source: https://github.com/RosettaCommons/RFdiffusion
- ColabDesign Source: https://github.com/sokrypton/ColabDesign
Steps to Reproduce:
- Switched to a Python 3.9 environment.
- Created a base environment using
conda create -n SE3nv python=3.9. - Installed PyTorch ecosystem (
pytorch==2.0.0,torchvision==0.15.2,cudatoolkit=11.8) fromconda-forge. (This step succeeded.) - Installed DGL latest version (2.1.0) via
pip install dgl. (This step succeeded.) - Installed TorchAudio 2.0.0 via
pip install torchaudio==2.0.0. (This step succeeded.) - Installed
RFdiffusion/env/SE3TransformerandColabDesign(pip install -e .). - Installed remaining pip dependencies (omegaconf, icecream, pyrsistent, matplotlib, ipywidgets, py3Dmol, jupyterlab, etc.).
- Created a base environment using
- Attempted
python -c "from inference.utils import parse_pdb".
Expected Behavior: RFdiffusion modules should import successfully.
Actual Behavior (Traceback):
(Please copy and paste the full traceback from your most recent ModuleNotFoundError: No module named 'torchdata.datapipes' here. For example:)
Traceback (most recent call last):
File "
Additional Context (What I've Tried):
- Attempted installation in Python 3.10 environment, but failed due to DGL, TorchData, and TorchAudio version conflicts.
- For Python 3.9,
conda installattempts for DGL versions (0.8.1, 0.9.1, 1.0.0, 1.1.2 withcu118label) fromdglteamanddefaultschannels consistently failed withPackagesNotFoundError. This indicates Conda cannot find or resolve theseaarch64builds. - Manually checked
~/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torchdata/vials -land confirmed that thedatapipesdirectory is physically missing. This strongly suggests a structural mismatch between DGL 2.1.0's requirements and the available TorchData 0.11.0 build foraarch64.
Question:
Are there any known working dgl / torchdata version combinations or specific installation instructions for linux-aarch64 that successfully resolve this torchdata.datapipes issue? Are there any official aarch64 Docker images or specific Dockerfile modifications known to work?
Thank you for your time and assistance.
Hey @rudgmleo . I was having the same module not found error for the torchdata.datapipes.
What worked out for me in the current venv and system specs setup was following the installation instructions here: https://www.dgl.ai/pages/start.html
My specs:
venv: Python 3.12.3
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
System:
- Kernel: 6.8.0-51-generic arch: x86_64 bits: 64
- Desktop: Cinnamon v: 6.4.6 Distro: Linux Mint 22.1 Xia
CPU:
- Info: 6-core model: Intel Core i7-9750H bits: 64 type: MT MCP cache: L2: 1.5 MiB
- Speed (MHz): avg: 800 min/max: 800/4500 cores: 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 801 12: 800
Graphics:
- Device-1: NVIDIA TU117M [GeForce GTX 1650 Mobile / Max-Q] driver: nvidia v: 575.64.03
- API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 575.64.03 renderer: NVIDIA GeForce GTX 1650/PCIe/SSE2
You can adjust your specs accordingly:
Hello @[Gil32610]
Thank you for your response and for sharing your solution! It's very helpful to know that you encountered the exact same ModuleNotFoundError: No module named 'torchdata.datapipes' and managed to resolve it. This confirms the root cause of the issue.
However, after reviewing your specs and solution, I believe the specific fix you used is unfortunately not directly applicable to my linux-aarch64 (ARM64) system.
Reason for non-applicability:
Your dgl installation command points to a cu124 (CUDA 12.4) build of DGL (https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html). This build is specifically compiled for x86_64 architectures. My server is aarch64, which means these x86_64 specific builds are incompatible.
Upon inspecting https://data.dgl.ai/wheels/torch-2.4/cpu/ for aarch64 CPU builds, the available dgl version is 2.1.0 (e.g., dgl-2.1.0-cp39-cp39-manylinux2014_aarch64.whl). As I detailed in my original issue, dgl-2.1.0 (when installed on aarch64) still results in the ModuleNotFoundError: No module named 'torchdata.datapipes', because the torchdata==0.11.0 package for aarch64 physically lacks the datapipes module that dgl 2.1.0 requires.
Therefore, the core challenge of finding a compatible dgl / torchdata build for linux-aarch64 that resolves the torchdata.datapipes issue remains.
Do you or anyone else have insights into a working dgl / torchdata combination specifically for aarch64?
Thank you for your understanding and assistance.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
I had the same issue with DGL for aarch64, what i ended up doing was building dgl 2.4.0 from source for aarch64 (it was a bit painful).
It looks like dgl 2.4.0 does not depend on torchdata. If you are referring to RFD2 let me know. I installed torch 2.4.0 for aarch64 (CUDA 12.4 as well), so finally i have RFD2 running on this architecture and GPU
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
setting torchdata==0.7.1 is what finally fixed this stuff for me
I have the same problem, with python 3.10 cuda 12.1 pytorch 2.3.1 and torchdata 0.11.0
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you