dgl [Feature] (La)yer-Neigh(bor) sampling implementation

Description

CPU&GPU implementation of LABOR sampling, a drop-in replacement to Neighbor Sampling with same fanout hyperparameters. This work was done partly during a summer NVIDIA Devtech AI internship under @nv-dlasalle's mentorship. To run: cd examples/pytorch/labor python train_lightning_labor.py, to compare with neighbor sampling, python train_lightning_labor.py --sampler=neighbor. With 1000 batch size and 3 layers with 10,10,10 fanout on reddit, can sample 7x less vertices with same quality compared to neighbor sampling with importance-sampling=-1 option, while importance-sampling=0 option samples around 5x less and is faster than Neighbor Sampling in general.

Weighted sampling is available too, not tested yet though.

same_loss_zoom The loss curve on different datasets with same batch size. The soft edges represent the confidence interval. Number of sampled vertices can be found in Table below: same_budget_ordered Vertex sampling efficiency under the same sampling budget. A starting batch size of 1k is used and the batch size is adjusted at the end of each epoch to better match the vertex budget. The first row of plots is for the batch size and the second row of plots shows the number of vertices sampled in the last layer, $|V_3|$ of running averages. The last row shows the training loss curves.

Checklist

Please feel free to remove inapplicable items for your PR.

[x] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[x] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage
[ ] Code is well-documented
[x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[ ] If the PR is for a new model/paper, I've updated the example index here.

Changes

Sep 30 '22 16:09 mfbalin

@dgl-bot

Oct 06 '22 20:10 mfbalin

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Oct 08 '22 02:10 dgl-bot

@dgl-bot

Oct 08 '22 04:10 BarclayII

@mfbalin What is the difference between this LABOR algorithm and LADIES? If there is any difference, did you document the algorithm of LABOR algorithm somewhere?

Also I guess sample_labors is probably not a good name because labor is actually an English word and it does not refer to a specific algorithm so far (like the word Shadow in ShaDow-GNN). I would write out the full name of it (like LayerwiseSampler or something, or just LADIESSampler if there is no difference between LABOR and LADIES).

Oct 08 '22 04:10 BarclayII

Commit ID: d3dfd60757563e1a58002a4a49ac941a557b07c7

Build ID: 10

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Oct 08 '22 04:10 dgl-bot

@mfbalin What is the difference between this LABOR algorithm and LADIES? If there is any difference, did you document the algorithm of LABOR algorithm somewhere?

Also I guess sample_labors is probably not a good name because labor is actually an English word and it does not refer to a specific algorithm so far (like the word Shadow in ShaDow-GNN). I would write out the full name of it (like LayerwiseSampler or something, or just LADIESSampler if there is no difference between LABOR and LADIES).

It is a new algorithm, just like neighbor sampling, it samples fanout neighbors for each vertex but does this in a layer sampling fashion. Since it is a new algorithm and is a combination of layer and neighbor sampling, I call it labor sampling. A paper explaining the proposed method is currently under submission.

Oct 10 '22 15:10 mfbalin

@BarclayII This PR contains only the core of LABOR sampling now. #4718 has the example. Should be ready for reviews.

Oct 17 '22 02:10 mfbalin

@dgl-bot

Oct 25 '22 01:10 yaox12

Commit ID: 5fd2645fa39254a3f509214b3dcf158ba6e73394

Build ID: 28

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Oct 25 '22 01:10 dgl-bot

@dgl-bot

Oct 25 '22 02:10 yaox12

Commit ID: b2e1b2187db527553fe77669235cbea5c7053814

Build ID: 30

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Oct 25 '22 02:10 dgl-bot

@dgl-bot

Oct 25 '22 02:10 yaox12

Commit ID: d8862d31039f402c8dbb298f7f352c8c08584477

Build ID: 33

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Oct 25 '22 02:10 dgl-bot

@dgl-bot

Oct 25 '22 03:10 yaox12

Commit ID: a77efe6c70b1618afaf0dff68398d9bfdb3235b3

Build ID: 37

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Oct 25 '22 03:10 dgl-bot

Commit ID: 6ce66028ad712f49f76ace0e7c3f55d84127246b

Build ID: 35

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

Oct 25 '22 04:10 dgl-bot

Generally LGTM.

Oct 25 '22 07:10 yaox12

@dgl-bot

Oct 26 '22 01:10 yaox12

Commit ID: e2eff30c38aa2fa3ae7dc2849501667d9ce578aa

Build ID: 39

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

Oct 26 '22 01:10 dgl-bot

Commit ID: 6e30c549040423caa613b0b148c7977edcb59105

Build ID: 41

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Oct 27 '22 16:10 dgl-bot

@dgl-bot

Oct 28 '22 01:10 yaox12

Commit ID: 7b3996cc05ac92fc6a2088a5f9f6eb8aaa51756f

Build ID: 54

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Oct 28 '22 01:10 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Oct 28 '22 06:10 dgl-bot

Commit ID: 0db3c423d07de69c5b2c19c33c9532e904eca0c4

Build ID: 56

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Oct 28 '22 06:10 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Oct 28 '22 06:10 dgl-bot

Commit ID: 3ecad9b8d183bf05f302cbbf77e652cac1cd5691

Build ID: 57

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Oct 28 '22 06:10 dgl-bot

@dgl-bot

Oct 28 '22 09:10 BarclayII

Commit ID: 37944233d024a08e783e0ff5e62f0ab65513959d

Build ID: 58

Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].

Report path: link

Full logs path: link

Oct 28 '22 09:10 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Oct 28 '22 17:10 dgl-bot

Commit ID: cffe0a3ae0d8fa3fcb09853201fa3d19ad0d8f6c

Build ID: 59

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Oct 28 '22 17:10 dgl-bot