dgl
dgl copied to clipboard
[Dist] Reduce peak memory in DistDGL
Description
all the action items listed in https://github.com/dmlc/dgl/issues/4510 have been handled.
Checklist
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage
- [ ] Code is well-documented
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
- [ ] Related issue is referred in this PR
- [ ] If the PR is for a new model/paper, I've updated the example index here.
Changes
To trigger regression tests:
-
@dgl-bot run [instance-type] [which tests] [compare-with-branch]
; For example:@dgl-bot run g4dn.4xlarge all dmlc/master
or@dgl-bot run c5.9xlarge kernel,api dmlc/master
Commit ID: 84dd4df5de05f52e9b6586a644a9c1e7f39aaee2
Build ID: 1
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
Commit ID: 3e048a2e5296056018230e7e0fa6e0d442ac120e
Build ID: 2
Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].
Report path: link
Full logs path: link
Commit ID: f68cd0cf869887dcc19582a8e98aaee259ddabdf
Build ID: 3
Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].
Report path: link
Full logs path: link
Commit ID: a7cb67a1d0385b64904d108f6387ef673df93534
Build ID: 4
Status: ✅ CI test succeeded
Report path: link
Full logs path: link
Commit ID: e77ee160e219c0da96b2ffadccbd451af1214306
Build ID: 5
Status: ✅ CI test succeeded
Report path: link
Full logs path: link
Commit ID: f21d4b6f7a813771f8210c9daeda7165a849403c
Build ID: 6
Status: ❌ CI test failed in Stage [PyTorch Cugraph GPU Unit test].
Report path: link
Full logs path: link
Commit ID: f21d4b6f7a813771f8210c9daeda7165a849403c
Build ID: 7
Status: ❌ CI test failed in Stage [PyTorch Cugraph GPU Unit test].
Report path: link
Full logs path: link