dgl icon indicating copy to clipboard operation
dgl copied to clipboard

[Dist] Reduce peak memory in DistDGL

Open Rhett-Ying opened this issue 2 years ago • 10 comments

Description

all the action items listed in https://github.com/dmlc/dgl/issues/4510 have been handled.

Checklist

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [ ] Code is well-documented
  • [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • [ ] Related issue is referred in this PR
  • [ ] If the PR is for a new model/paper, I've updated the example index here.

Changes

Rhett-Ying avatar Oct 08 '22 09:10 Rhett-Ying

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot avatar Oct 08 '22 11:10 dgl-bot

Commit ID: 84dd4df5de05f52e9b6586a644a9c1e7f39aaee2

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot avatar Oct 08 '22 13:10 dgl-bot

Commit ID: 3e048a2e5296056018230e7e0fa6e0d442ac120e

Build ID: 2

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot avatar Oct 09 '22 01:10 dgl-bot

Commit ID: f68cd0cf869887dcc19582a8e98aaee259ddabdf

Build ID: 3

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot avatar Oct 09 '22 03:10 dgl-bot

Commit ID: a7cb67a1d0385b64904d108f6387ef673df93534

Build ID: 4

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

dgl-bot avatar Oct 09 '22 04:10 dgl-bot

Commit ID: e77ee160e219c0da96b2ffadccbd451af1214306

Build ID: 5

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

dgl-bot avatar Oct 10 '22 07:10 dgl-bot

Commit ID: f21d4b6f7a813771f8210c9daeda7165a849403c

Build ID: 6

Status: ❌ CI test failed in Stage [PyTorch Cugraph GPU Unit test].

Report path: link

Full logs path: link

dgl-bot avatar Oct 12 '22 06:10 dgl-bot

Commit ID: f21d4b6f7a813771f8210c9daeda7165a849403c

Build ID: 7

Status: ❌ CI test failed in Stage [PyTorch Cugraph GPU Unit test].

Report path: link

Full logs path: link

dgl-bot avatar Oct 12 '22 08:10 dgl-bot

Commit ID: 7af3fe7e63b244dd96701e688397aa216336d014

Build ID: 8

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

dgl-bot avatar Oct 13 '22 02:10 dgl-bot

Commit ID: a2e24523aa69e18c624ec8cbc27070f925f4c0d8

Build ID: 9

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

dgl-bot avatar Oct 13 '22 08:10 dgl-bot