dgl icon indicating copy to clipboard operation
dgl copied to clipboard

[Feature] Fixed sampler with limit on sampled nodes/edges in batch subgraph

Open ayushnoori opened this issue 1 year ago • 18 comments

Description

Subgraph sampler that sets an upper bound on the number of nodes included in each layer of the sampled subgraph. At each layer, the frontier is randomly subsampled. Rare node types can also be upsampled by taking the scaled square root of the sampling probabilities (best strategy TBD). The new FixedSampler performs node-wise neighbor sampling and returns the subgraph induced by all the sampled nodes.

The relevant issue is: https://github.com/dmlc/dgl/issues/6623, thanks to @frozenbugs and @jermainewang for their input and review.

Checklist

Please feel free to remove inapplicable items for your PR.

  • [X] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • [X] Code is well-documented
  • [X] Related issue is referred in this PR
  • [ ] All changes have test coverage

Please note that unit tests for FixedSampler are not yet written.

ayushnoori avatar Dec 04 '23 02:12 ayushnoori

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Dec 04 '23 02:12 dgl-bot

Commit ID: 928fe213334241781dfd646932af5ca24e85a93f

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Dec 04 '23 02:12 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Dec 28 '23 20:12 dgl-bot

Commit ID: fc5ce86a2ce54252235b1672124a28ca79437ca0

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Dec 28 '23 20:12 dgl-bot

Thanks for the edits, @frozenbugs. Happy to address all the formatting errors. Do you see any issues with the function implementation itself?

ayushnoori avatar Jan 28 '24 00:01 ayushnoori

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jan 28 '24 00:01 dgl-bot

Commit ID: 4228344858707739659a8c44b2234074f8230d74

Build ID: 3

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Jan 28 '24 00:01 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jan 28 '24 01:01 dgl-bot

Commit ID: 6f8acdb2fac5a7eb477c51a9162b92ab1e343eaa

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Jan 28 '24 01:01 dgl-bot

@dgl-bot

frozenbugs avatar Jan 29 '24 10:01 frozenbugs

Commit ID: 6f8acdb2fac5a7eb477c51a9162b92ab1e343eaa

Build ID: 5

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot avatar Jan 29 '24 10:01 dgl-bot

the code LGTM now, can you run lintrunner -a to fix the lint error? and can you add a unit test for your code.

The test case can be put here: tests/python/pytorch/dataloading, and you can use this test case as example: https://github.com/dmlc/dgl/blob/master/tests/python/pytorch/graphbolt/impl/test_in_subgraph_sampler.py

frozenbugs avatar Jan 29 '24 10:01 frozenbugs

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Feb 09 '24 20:02 dgl-bot

Commit ID: d42b1ec2bdca14a28da11c3e5cbb5cab5f2aa768

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Feb 09 '24 20:02 dgl-bot

@dgl-bot

frozenbugs avatar Feb 19 '24 02:02 frozenbugs

Commit ID: cdc82adc722c688ee0abd1125069d970bb157f05

Build ID: 7

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot avatar Feb 19 '24 02:02 dgl-bot

************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:60:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)
python/dgl/dataloading/capped_neighbor_sampler.py:151:27: C0103: Variable name "node_IDs" doesn't conform to snake_case naming style (invalid-name)
python/dgl/dataloading/capped_neighbor_sampler.py:102:27: W0612: Unused variable 'rel_type' (unused-variable)
python/dgl/dataloading/capped_neighbor_sampler.py:102:37: W0612: Unused variable 'dst_type' (unused-variable)

frozenbugs avatar Feb 22 '24 06:02 frozenbugs

@ayushnoori if you don't plan to write a unit test, we can also merge this, can you add a comment in capped_neighbor_sampler.py, says "This code was contributed by a community member (ayushnoori). There aren't currently any unit tests in place to verify its functionality, so please be cautious if you need to make any changes to the code's logic."

frozenbugs avatar Feb 22 '24 06:02 frozenbugs

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Commit ID: 6c5cf83e7f67aee639b4345c3481e96cf85559c1

Build ID: 8

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Commit ID: 02d95f86682c60658fb08e116b0239a54ec8be9f

Build ID: 9

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

@dgl-bot

mfbalin avatar Jul 06 '24 05:07 mfbalin

Hi @frozenbugs, I'm kindly following up with this PR. I've added the requested comment – please let me know if this LGTM. Thanks again!

ayushnoori avatar Jul 06 '24 05:07 ayushnoori

Commit ID: 69ffb17fab9147de1a4d527b7f0b1fef3602f8c7

Build ID: 10

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Commit ID: c11b8717a8c021dc0af61a22e05b257dfdb3a399

Build ID: 11

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

dgl-bot avatar Jul 06 '24 05:07 dgl-bot

Hmm, not sure why CI test failed. I already ran lintrunner -a, this is what I see:

> lintrunner -a
Warning: Could not find a lintrunner config at: '.lintrunner.private.toml'. Continuing without using configuration file.
WARNING: No previous init data found. If this is the first time you're running lintrunner, you should run `lintrunner init`.
ok No lint issues.
Successfully applied all patches.

> git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Have also merged changes so this PR is up-to-date.

ayushnoori avatar Jul 06 '24 05:07 ayushnoori

There are 2 lint checkers, you passed the initial one but not the one in the CI that runs after that.

https://dgl-ci-result.s3.us-west-2.amazonaws.com/dgl/PR-6668/10/10/logs/logs_dir/tmp1ptbor3n.log

************* Module dgl.dataloading.capped_neighbor_sampler
python/dgl/dataloading/capped_neighbor_sampler.py:65:4: W0221: Parameters differ from overridden 'sample' method (arguments-differ)
python/dgl/dataloading/capped_neighbor_sampler.py:156:27: C0103: Variable name "node_IDs" doesn't conform to snake_case naming style (invalid-name)
python/dgl/dataloading/capped_neighbor_sampler.py:107:27: W0612: Unused variable 'rel_type' (unused-variable)
python/dgl/dataloading/capped_neighbor_sampler.py:107:37: W0612: Unused variable 'dst_type' (unused-variable)

mfbalin avatar Jul 06 '24 05:07 mfbalin

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

dgl-bot avatar Jul 06 '24 06:07 dgl-bot