pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Deprecate DGL: freeze on CPU (torch >=2.5) + slow ABI movement

Open lmeyerov opened this issue 1 month ago • 0 comments

Summary

DGL support on CPU is effectively frozen and lags Torch ABI upgrades. The latest CPU wheels stop at DGL 2.1.0 (Torch ~2.0). Newer DGL versions (e.g., 2.4.0) publish only CUDA wheels. With Torch 2.5–2.9, we cannot run DGL-based features or tests on CPU without building DGL from source. We should move to PyTorch Geometric, which is actively maintained.

Evidence

  • DGL wheel index (https://data.dgl.ai/wheels/repo.html) shows CPU wheels only up to 2.1.0; 2.4.0 is CUDA-only.
  • Installing Torch 2.8/2.9 + DGL CPU fails (GraphBolt → torchdata pins to Torch 2.0.x).
  • Our server Dockerfile uses Torch 2.9.0 + DGL 2.4.0 cu124 (GPU works), but CI CPU runners cannot match it.
  • No CPU ABI updates from DGL for ~16 months.

Impact

  • DGL-dependent tests (embed_utils, networks) break or get skipped on CPU with current Torch.
  • CI CPU matrix cannot validate DGL paths, increasing regression risk.

Proposal

  • Begin deprecating DGL and plan a migration to PyTorch Geometric (PyG) for GNN features.
  • Document the supported Torch/DGL matrix:
    • CPU: Torch ~2.0 + DGL 2.1.0 only
    • GPU: DGL 2.4.0 cu124 + Torch 2.8/2.9
  • Mark CPU DGL tests as xfail or disable them until migration.

Tasks

  • Update CI and guides to align with the supported matrix; explicitly skip CPU DGL tests.
  • Draft the PyG migration plan (feature parity, loaders, batching).
  • Update documentation to reflect DGL limitations and upcoming deprecation.

lmeyerov avatar Nov 30 '25 07:11 lmeyerov