dgl icon indicating copy to clipboard operation
dgl copied to clipboard

[GraphBolt] TorchData Pytorch support

Open mfbalin opened this issue 9 months ago • 4 comments

🔨Work Item

IMPORTANT:

  • This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
  • DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

https://github.com/pytorch/pytorch/issues/124907#issuecomment-2077135173

Here, torch developers say that future versions of pytorch may not support torchdata properly. It might become a problem to support later PyTorch versions.

mfbalin avatar Apr 27 '24 00:04 mfbalin

Previously we're trying to deprecate torchdata with torch.utils.data for datapipe-related operations as active development and release of torchdata have been paused(mentioned here).

So for now, both pytorch and torchdata team are deprecating torchdata?

Rhett-Ying avatar Apr 28 '24 01:04 Rhett-Ying

I don't know the exact details. We need to look into it as it is a crucial dependency.

mfbalin avatar Apr 28 '24 01:04 mfbalin

https://discuss.dgl.ai/t/importerror-cannot-import-name-dill-available-from-torch-utils-data-datapipes-utils-common/4363/2 might be a related problem, I saw a PR in torch repo that fix this issue.

mfbalin avatar Apr 28 '24 01:04 mfbalin

The way we implement DataLoader (https://github.com/dmlc/dgl/blob/658b2086b09bbd76c3d3f488af2b155a1c921052/python/dgl/graphbolt/dataloader.py#L79C7-L79C17) right now isn't perfect. It makes a lot of assumption that might cause problems later. Once those problems hit, we should redesign it. We held off because the torch.data already does a good job, but if we have to, we'll tackle it then.

frozenbugs avatar Apr 28 '24 06:04 frozenbugs

@frozenbugs @Rhett-Ying https://github.com/pytorch/data/pull/1277, we need to pin torchdata<0.8.0 to avoid the deprecation message.

mfbalin avatar Jul 13 '24 19:07 mfbalin

@frozenbugs I bet we can implement IterDataPipe in less than 100 lines of code and get rid of our dependency forever. MapDataPipe would be 10 lines of code once IterDataPipe is in place. The utility functions should also be doable.

mfbalin avatar Jul 31 '24 23:07 mfbalin

Archived: It is probably not an option to pin torchdata < 0.8 since it has dependency on torch core, which will lead to dependency on 2 different torch version https://github.com/dmlc/dgl/pull/7604.

frozenbugs avatar Aug 01 '24 02:08 frozenbugs

#7609 has removed torchdata imports from GraphBolt. We still need to remove torchdata as a dependency.

mfbalin avatar Aug 07 '24 00:08 mfbalin

#7609 has removed torchdata imports from GraphBolt. We still need to remove torchdata as a dependency.

Pending https://github.com/dmlc/dgl/pull/7667

frozenbugs avatar Aug 07 '24 09:08 frozenbugs

Close this for now, let's see what's new from Torch.

frozenbugs avatar Aug 20 '24 09:08 frozenbugs