[Feature] (La)yer-Neigh(bor) sampling implementation
Description
CPU&GPU implementation of LABOR sampling, a drop-in replacement to Neighbor Sampling with same fanout hyperparameters. This work was done partly during a summer NVIDIA Devtech AI internship under @nv-dlasalle's mentorship. To run:
cd examples/pytorch/labor
python train_lightning_labor.py, to compare with neighbor sampling, python train_lightning_labor.py --sampler=neighbor.
With 1000 batch size and 3 layers with 10,10,10 fanout on reddit, can sample 7x less vertices with same quality compared to neighbor sampling with importance-sampling=-1 option, while importance-sampling=0 option samples around 5x less and is faster than Neighbor Sampling in general.
Weighted sampling is available too, not tested yet though.
The loss curve on different datasets with same batch size. The soft edges represent the confidence interval. Number of sampled vertices can be found in Table below:
Vertex sampling efficiency under the same sampling budget. A starting batch size of 1k is used and the batch size is adjusted at the end of each epoch to better match the vertex budget. The first row of plots is for the batch size and the second row of plots shows the number of vertices sampled in the last layer, $|V_3|$ of running averages. The last row shows the training loss curves.
Checklist
Please feel free to remove inapplicable items for your PR.
- [x] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
- [x] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage
- [ ] Code is well-documented
- [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
- [ ] If the PR is for a new model/paper, I've updated the example index here.
Changes
@dgl-bot
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
@dgl-bot
@dgl-bot
@mfbalin What is the difference between this LABOR algorithm and LADIES? If there is any difference, did you document the algorithm of LABOR algorithm somewhere?
Also I guess sample_labors is probably not a good name because labor is actually an English word and it does not refer to a specific algorithm so far (like the word Shadow in ShaDow-GNN). I would write out the full name of it (like LayerwiseSampler or something, or just LADIESSampler if there is no difference between LABOR and LADIES).
Commit ID: d3dfd60757563e1a58002a4a49ac941a557b07c7
Build ID: 10
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
@mfbalin What is the difference between this LABOR algorithm and LADIES? If there is any difference, did you document the algorithm of LABOR algorithm somewhere?
Also I guess
sample_laborsis probably not a good name becauselaboris actually an English word and it does not refer to a specific algorithm so far (like the wordShadowinShaDow-GNN). I would write out the full name of it (likeLayerwiseSampleror something, or justLADIESSamplerif there is no difference between LABOR and LADIES).
It is a new algorithm, just like neighbor sampling, it samples fanout neighbors for each vertex but does this in a layer sampling fashion. Since it is a new algorithm and is a combination of layer and neighbor sampling, I call it labor sampling. A paper explaining the proposed method is currently under submission.
@BarclayII This PR contains only the core of LABOR sampling now. #4718 has the example. Should be ready for reviews.
@dgl-bot
Commit ID: 5fd2645fa39254a3f509214b3dcf158ba6e73394
Build ID: 28
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: b2e1b2187db527553fe77669235cbea5c7053814
Build ID: 30
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: d8862d31039f402c8dbb298f7f352c8c08584477
Build ID: 33
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: a77efe6c70b1618afaf0dff68398d9bfdb3235b3
Build ID: 37
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Commit ID: 6ce66028ad712f49f76ace0e7c3f55d84127246b
Build ID: 35
Status: ❌ CI test failed in Stage [CPU Build (Win64)].
Report path: link
Full logs path: link
Generally LGTM.
@dgl-bot
Commit ID: e2eff30c38aa2fa3ae7dc2849501667d9ce578aa
Build ID: 39
Status: ✅ CI test succeeded
Report path: link
Full logs path: link
Commit ID: 6e30c549040423caa613b0b148c7977edcb59105
Build ID: 41
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: 7b3996cc05ac92fc6a2088a5f9f6eb8aaa51756f
Build ID: 54
Status: ❌ CI test failed in Stage [Lint Check].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
@dgl-bot
Commit ID: 0db3c423d07de69c5b2c19c33c9532e904eca0c4
Build ID: 56
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
@dgl-bot
Commit ID: 3ecad9b8d183bf05f302cbbf77e652cac1cd5691
Build ID: 57
Status: ❌ CI test failed in Stage [Authentication].
Report path: link
Full logs path: link
@dgl-bot
Commit ID: 37944233d024a08e783e0ff5e62f0ab65513959d
Build ID: 58
Status: ❌ CI test failed in Stage [Distributed Torch CPU Unit test].
Report path: link
Full logs path: link
Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:
@dgl-bot