dgl
dgl copied to clipboard
[GraphBolt] Update `ItemSampler`
Description
- Update
ItemSamplerto support correct stochastic sharding across distributed groups. - Modify the logic of
ItemSet.__getitem__()when index is an iterable of int.
Benchmark: https://docs.google.com/document/d/1Pzk2PJoFtTZSu17wTXVK4mqvfrMLAj2xK6fcGC1pwEg/edit?usp=sharing
Checklist
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
- [ ] I've leverage the tools to beautify the python and c++ code.
- [ ] The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
- [ ] All changes have test coverage
- [ ] Code is well-documented
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
- [ ] Related issue is referred in this PR
- [ ] If the PR is for a new model/paper, I've updated the example index here.
Changes
To trigger regression tests:
@dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example:@dgl-bot run g4dn.4xlarge all dmlc/masteror@dgl-bot run c5.9xlarge kernel,api dmlc/master
The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.
Commit ID: 0134ddf809e73c871f5aacbe4cf9576544531f81
Build ID: 1
Status: ✅ CI test succeeded.
Report path: link
Full logs path: link
The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.
Sure, could u please give me some guide on how to do that?
The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that.
Sure, could u please give me some guide on how to do that?
The simplest way to monitor htop -d5 or nvtop to see what is the CPU utilization. If you wanted to be more precise, you could also insert timing code into ItemSampler to see how long each call takes. It may also make sense to write a regression benchmark so that we can monitor its runtime. We can have a sampling pipeline only containing the ItemSampler and benchmark it.
Commit ID: 428d47a3b483c1e7f9b403124a56763afbac038b
Build ID: 2
Status: ✅ CI test succeeded.
Report path: link
Full logs path: link
Commit ID: fa3ccb337322150991a773951da7f74ed897e8a7
Build ID: 3
Status: ✅ CI test succeeded.
Report path: link
Full logs path: link
Commit ID: bacdfb1c6088d51474c3581e84ef998da9d6185b
Build ID: 4
Status: ✅ CI test succeeded.
Report path: link
Full logs path: link
@mfbalin Thank you for your valuable suggestions, but considering the scope of this PR, I'd like to defer them to another PR.
@Rhett-Ying If everything looks good to you, please approve so I can work on.
@Skeleton003 According to the doc, examples/sampling/graphbolt/link_prediction.py runs a bit slower. Please keep an eye on the regression results.
Commit ID: 1d69f6b7b4119f7cefb5801ba4ea01c94c573a4f
Build ID: 5
Status: ⚪️ CI test cancelled due to overrun.
Report path: link
Full logs path: link