dgl [Feature] Gpu cache for node and edge data

Description

A gpu_cache that can be used to cache vertex or edge feature storage is implemented. The core implementation comes from the HugeCTR repository and this PR is basically a wrapper around the gpu_cache available in it. The planned use case for it is in the Dataloader to seamlessly wrap the feature storage and speedup access to the node or edge features.

Fixes issue #3461.

Example usage: python train_sampling_unsupervised.py --graph-device=gpu --data-device=uva --cache-size=1000000 --dataset=ogbn-products This gets 206k samples/sec while the without cache version gets 130k samples/sec. When all of the features are on the GPU without the cache, it gets 350k samples/sec.

Checklist

Please feel free to remove inapplicable items for your PR.

[x] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[x] Changes are complete (i.e. I finished coding on this PR)
[x] All changes have test coverage
[ ] Code is well-documented
[x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[x] Related issue is referred in this PR

Aug 08 '22 20:08 mfbalin

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch]; For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

Aug 08 '22 20:08 dgl-bot

Commit ID: e7adf3dc3312743ea4730bf8172b78c67e5e2dfa

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Aug 08 '22 20:08 dgl-bot

Commit ID: f64b6ff8e8d59154e4d85832f216b28775a13add

Build ID: 2

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 08 '22 22:08 dgl-bot

Commit ID: 17bc74889aa274396115e39c22967ab36c93f0c0

Build ID: 3

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 08 '22 23:08 dgl-bot

Commit ID: c4344b5436fee1cb71797fb0207a4f618da40b09

Build ID: 4

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 08 '22 23:08 dgl-bot

Commit ID: 9152230225e9bd4e4806a781c716a55a9057add0

Build ID: 5

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Aug 10 '22 23:08 dgl-bot

Commit ID: cb028d60f564b6ea87512af0060be0df05481bb4

Build ID: 6

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 10 '22 23:08 dgl-bot

Commit ID: 61f31a64cb29238bf2e69081d7ccb0451f2d1873

Build ID: 7

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 11 '22 21:08 dgl-bot

Commit ID: 61931ba1ef44e9c0e8ca64d977f42dc24f1110b9

Build ID: 8

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 14 '22 18:08 dgl-bot

Hi, thanks for the contribution. I think having a GPU cache in overall is a good addition to DGL but we need to think though the user experience first before making changes. Here are my major questions/suggestions:

Is it possible to fold the GPU cache into one of the FeatureStorage class? You could check out the existing feature storage classes here and the base class here. Perhaps you could create a subclass called GPUCacheFeatureStorage.
How to minimize package dependencies? The PR currently introduces a new third party dependency to HugeCTR, which honestly I don't know much about. Is it possible to limit the dependency to Python side? If HugeCTR provides Python APIs for creating and accessing embedding cache, we could wrap it in a Python class.
Let's not complicate the train_unsupervised script further because its purpose is to educate novice users about unsupervised training. Your setting is more advanced and should be demonstrated with a standalone script.

Aug 15 '22 08:08 jermainewang

@jermainewang HugeCTR doesn't currently expose this via pytorch--however I think this only uses a handful of CPP files from it, so alternatively we could include just the needed files in third_party instead of a submodule.

I agree, ideally we should have a way to wrap FeatureStore (or FeatureSource from #4431), so that regardless of where you're pulling features from, you could cache them on the training GPU to reduce traffic.

Aug 18 '22 02:08 nv-dlasalle

@jermainewang I have added GpuCacheFeatureStorage and CachedTensor classes for easier use of the GpuCache. With the addition of CachedTensor, it is very simple to use the GpuCache and the modifications to the existing example are minimal now. However, I can still take those changes out and create a standalone example for the GpuCache once the API of the GpuCache is finalized.

@nv-dlasalle I need feedback about the device argument of UnifiedTensor, GpuCache doesn't take a device argument and uses the default cuda device to put the cache in, should the API change so that the device argument is given from the user?

Aug 18 '22 03:08 mfbalin

Commit ID: 4fa1994c8ad388564a25acb0f4258cf9b10880fc

Build ID: 9

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 18 '22 03:08 dgl-bot

Commit ID: cac30539f31a497fe0fac779719b5ed05415fbf6

Build ID: 10

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 18 '22 03:08 dgl-bot

@jermainewang HugeCTR doesn't currently expose this via pytorch--however I think this only uses a handful of CPP files from it, so alternatively we could include just the needed files in third_party instead of a submodule.

Can you list the files to be included? Alternatively, we could borrow them into the source tree directly if there are not many and the license is compatible. The risk is that future patch in the upstream cannot be easily integrated here, which means we need to have an owner that knows them.

Aug 18 '22 04:08 jermainewang

We need to think of how to use customized FeatureStorage with dgl.DGLGraph, in particular without creating a customized wrapper of the DGLGraph object - this is what we previously do but now I think it's burdensome in retrospect. Perhaps get_node_storage and get_edge_storage as methods of the GraphStorage is not a good option.

Aug 18 '22 04:08 BarclayII

@jermainewang HugeCTR doesn't currently expose this via pytorch--however I think this only uses a handful of CPP files from it, so alternatively we could include just the needed files in third_party instead of a submodule.

Can you list the files to be included? Alternatively, we could borrow them into the source tree directly if there are not many and the license is compatible. The risk is that future patch in the upstream cannot be easily integrated here, which means we need to have an owner that knows them.

There are only 4 required source and header files needed for the gpu cache, basically the files under this directory: https://github.com/NVIDIA-Merlin/HugeCTR/tree/master/gpu_cache

Aug 18 '22 17:08 mfbalin

Commit ID: 44b70bfce9ec470cb229f55ad7064a987138716a

Build ID: 11

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Aug 18 '22 20:08 dgl-bot

Commit ID: 4632b639cd69072aedb365fccb21f0d828b906dc

Build ID: 12

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Aug 24 '22 18:08 dgl-bot

Can I get a second round of reviews for the recent updates implementing FeatureStorage and a new example using it for GPUCache training?

Aug 24 '22 20:08 mfbalin

Commit ID: None

Build ID: 14

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

Sep 06 '22 02:09 dgl-bot

Can I get a second round of reviews for the recent updates implementing FeatureStorage and a new example using it for GPUCache training?

Sorry for the late reply. We are waiting for a review of the Dataloader changes tomorrow, which will decide how to move forward with this PR.

Sep 06 '22 05:09 jermainewang

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Nov 25 '22 03:11 dgl-bot

Commit ID: 6b900daf2295685580dc08d08cb6e9b2b4ebc336

Build ID: 15

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Nov 25 '22 03:11 dgl-bot

@dgl-bot

Nov 25 '22 03:11 yaox12

Commit ID: 6b900daf2295685580dc08d08cb6e9b2b4ebc336

Build ID: 16

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Nov 25 '22 03:11 dgl-bot

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

@dgl-bot

Nov 25 '22 03:11 dgl-bot

@dgl-bot

Nov 25 '22 03:11 yaox12

Commit ID: 9699301392e8cab67dc8e1d93d3cd6c14f152531

Build ID: 17

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Nov 25 '22 03:11 dgl-bot

Commit ID: 9699301392e8cab67dc8e1d93d3cd6c14f152531

Build ID: 18

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

Nov 25 '22 04:11 dgl-bot