dgl
dgl copied to clipboard
[Feature] Fixed sampler with limit on sampled nodes/edges in batch subgraph
Description
Subgraph sampler that sets an upper bound on the number of nodes included in each layer of the sampled subgraph. At each layer, the frontier is randomly subsampled. Rare node types can also be upsampled by taking the scaled square root of the sampling probabilities (best strategy TBD). The new FixedSampler
performs node-wise neighbor sampling and returns the subgraph induced by all the sampled nodes.
The relevant issue is: https://github.com/dmlc/dgl/issues/6623, thanks to @frozenbugs and @jermainewang for their input and review.
Checklist
Please feel free to remove inapplicable items for your PR.
- [X] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
- [X] Code is well-documented
- [X] Related issue is referred in this PR
- [ ] All changes have test coverage
Please note that unit tests for FixedSampler
are not yet written.
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: 928fe213334241781dfd646932af5ca24e85a93f
Build ID: 1
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: fc5ce86a2ce54252235b1672124a28ca79437ca0
Build ID: 2
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Thanks for the edits, @frozenbugs. Happy to address all the formatting errors. Do you see any issues with the function implementation itself?
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: 4228344858707739659a8c44b2234074f8230d74
Build ID: 3
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: 6f8acdb2fac5a7eb477c51a9162b92ab1e343eaa
Build ID: 4
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: 6f8acdb2fac5a7eb477c51a9162b92ab1e343eaa
Build ID: 5
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
the code LGTM now, can you run lintrunner -a
to fix the lint error? and can you add a unit test for your code.
The test case can be put here: tests/python/pytorch/dataloading, and you can use this test case as example: https://github.com/dmlc/dgl/blob/master/tests/python/pytorch/graphbolt/impl/test_in_subgraph_sampler.py
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: d42b1ec2bdca14a28da11c3e5cbb5cab5f2aa768
Build ID: 6
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: cdc82adc722c688ee0abd1125069d970bb157f05
Build ID: 7
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:60:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)
python/dgl/dataloading/capped_neighbor_sampler.py:151:27: C0103: Variable name "node_IDs" doesn't conform to snake_case naming style (invalid-name)
python/dgl/dataloading/capped_neighbor_sampler.py:102:27: W0612: Unused variable 'rel_type' (unused-variable)
python/dgl/dataloading/capped_neighbor_sampler.py:102:37: W0612: Unused variable 'dst_type' (unused-variable)
@ayushnoori if you don't plan to write a unit test, we can also merge this, can you add a comment in capped_neighbor_sampler.py, says "This code was contributed by a community member (ayushnoori). There aren't currently any unit tests in place to verify its functionality, so please be cautious if you need to make any changes to the code's logic."
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: 6c5cf83e7f67aee639b4345c3481e96cf85559c1
Build ID: 8
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: 02d95f86682c60658fb08e116b0239a54ec8be9f
Build ID: 9
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
@dgl-bot
Hi @frozenbugs, I'm kindly following up with this PR. I've added the requested comment – please let me know if this LGTM. Thanks again!
Commit ID: 69ffb17fab9147de1a4d527b7f0b1fef3602f8c7
Build ID: 10
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot
Commit ID: c11b8717a8c021dc0af61a22e05b257dfdb3a399
Build ID: 11
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Hmm, not sure why CI test failed. I already ran lintrunner -a
, this is what I see:
> lintrunner -a
Warning: Could not find a lintrunner config at: '.lintrunner.private.toml'. Continuing without using configuration file.
WARNING: No previous init data found. If this is the first time you're running lintrunner, you should run `lintrunner init`.
ok No lint issues.
Successfully applied all patches.
> git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
Have also merged changes so this PR is up-to-date.
There are 2 lint checkers, you passed the initial one but not the one in the CI that runs after that.
https://dgl-ci-result.s3.us-west-2.amazonaws.com/dgl/PR-6668/10/10/logs/logs_dir/tmp1ptbor3n.log
************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:65:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)
python/dgl/dataloading/capped_neighbor_sampler.py:156:27: C0103: Variable name "node_IDs" doesn't conform to snake_case naming style (invalid-name)
python/dgl/dataloading/capped_neighbor_sampler.py:107:27: W0612: Unused variable 'rel_type' (unused-variable)
python/dgl/dataloading/capped_neighbor_sampler.py:107:37: W0612: Unused variable 'dst_type' (unused-variable)
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
-
@dgl-bot