dgl
dgl copied to clipboard
[GraphBolt] TorchData Pytorch support
🔨Work Item
IMPORTANT:
- This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
- DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.
Project tracker: https://github.com/orgs/dmlc/projects/2
Description
https://github.com/pytorch/pytorch/issues/124907#issuecomment-2077135173
Here, torch developers say that future versions of pytorch may not support torchdata properly. It might become a problem to support later PyTorch versions.
Previously we're trying to deprecate torchdata
with torch.utils.data
for datapipe-related operations as active development and release of torchdata
have been paused(mentioned here).
So for now, both pytorch and torchdata team are deprecating torchdata?
I don't know the exact details. We need to look into it as it is a crucial dependency.
https://discuss.dgl.ai/t/importerror-cannot-import-name-dill-available-from-torch-utils-data-datapipes-utils-common/4363/2 might be a related problem, I saw a PR in torch repo that fix this issue.
The way we implement DataLoader (https://github.com/dmlc/dgl/blob/658b2086b09bbd76c3d3f488af2b155a1c921052/python/dgl/graphbolt/dataloader.py#L79C7-L79C17) right now isn't perfect. It makes a lot of assumption that might cause problems later. Once those problems hit, we should redesign it. We held off because the torch.data already does a good job, but if we have to, we'll tackle it then.
@frozenbugs @Rhett-Ying https://github.com/pytorch/data/pull/1277, we need to pin torchdata<0.8.0
to avoid the deprecation message.
@frozenbugs I bet we can implement IterDataPipe in less than 100 lines of code and get rid of our dependency forever. MapDataPipe would be 10 lines of code once IterDataPipe is in place. The utility functions should also be doable.
Archived: It is probably not an option to pin torchdata < 0.8 since it has dependency on torch core, which will lead to dependency on 2 different torch version https://github.com/dmlc/dgl/pull/7604.
#7609 has removed torchdata imports from GraphBolt. We still need to remove torchdata as a dependency.
#7609 has removed torchdata imports from GraphBolt. We still need to remove torchdata as a dependency.
Pending https://github.com/dmlc/dgl/pull/7667
Close this for now, let's see what's new from Torch.