What do these changes do?

FEATURE-759, try to enable GPU memory object sharing. For now, give a naïve implementation for createGPUBuffer.

Design Brief

Implement GPUBulkStore with the same functionality interface as BulkStore.
Implement a GPU memory allocator, for now, naïve "allocate-on-request" is adopted.
Client and Server add gpu-specific session support.
Server sends the cudaIpcMemHandle to client in reply, client open the handle to get access to the shared GPU memory.

New class:

GPUBulkStore: with the same interface as BulkStore.
GPUBulkAllocator: with the same interface as BulkAllocator.
GPUAllocator: GPU memory allocator, it is left for future work.

Main Related class:

Client: add GPU-specific request method, like createGPUBuffer.
VineyardServer, SocketConnection: add GPU-specific session support.
Payload: add a representation bit is_gpu to distinguish GPU object from CPU object.

Detailed Design

GPU Memory
- Implement GPUBulkStore and GPUBulkAllocator for managing objects on gpu.
- GPUAlloctor similar to dlmalloc is not implemented.
Client
- Add new method, like createGPUBuffer, send GPU-specific request to server.
- Client will receive a cudaIpcMemHandle_t handle in reply and open the handle to access to the shared gpu memory .
Server
- add gpu_bulk_store_ for managing gpu objects.
- add gpu-specific session support and response to client gpu-specific request.
- gpu object sharing is accomplished by getting a cudaIpcMemHandle handle and sending the handle to client in reply.
Client/Server interaction
- add necessary protocols, like CreateGPUBufferRequest and associated functions.

Related issue number

Feature #759 cc @septicmk

Jul 24 '22 13:07 CSWater

Hi @CSWater, I have left some comments for you, I suggest you consider the following at an early stage:

Make the GPU-feature optional.
Allocate the GPU memory on-demand, building a custom GPU allocator is not in the plan.
Disable the ColdObjectTracker, since this is another orthogonal under-construction feature.
We still need the unit test but don't trigger it in CI since we do not have an online GPU environment.

Jul 25 '22 02:07 mengke-mk

Hi @CSWater, I have following suggestion:

We should pushdown the ENABLE_GPU marcos in /gpu to make the regular code path more clear. GPU-related methods are always callable but return Status::NotImplement() when GPU feature is not toggled on.
To support heterogeneous objects (blobs are allocated in both GPU and CPU) we should merge GPUBulkStore and BulkStore. You can add something like create_gpu and delete_gpu in the BulkStore.
Rebase the code and solve the conflict after you have finalized your code.

Aug 19 '22 01:08 mengke-mk

Codecov Report

Merging #876 (dfcc2a7) into main (d77c3e2) will decrease coverage by 1.72%. The diff coverage is 4.76%.

:exclamation: Current head dfcc2a7 differs from pull request most recent head 420c28c. Consider uploading reports for the commit 420c28c to get more accurate results

@@            Coverage Diff             @@
##             main     #876      +/-   ##
==========================================
- Coverage   70.70%   68.98%   -1.73%     
==========================================
  Files          77       80       +3     
  Lines        8030     8234     +204     
==========================================
+ Hits         5678     5680       +2     
- Misses       2352     2554     +202

Impacted Files	Coverage Δ
src/client/client.cc	`65.68% <0.00%> (-2.28%)`	:arrow_down:
src/client/client.h	`100.00% <ø> (ø)`
src/common/memory/gpu/unified_memory.cc	`0.00% <0.00%> (ø)`
src/common/memory/gpu/unified_memory.h	`0.00% <0.00%> (ø)`
src/common/util/protocols.cc	`70.10% <0.00%> (-4.85%)`	:arrow_down:
src/server/async/socket_server.cc	`77.71% <0.00%> (-1.54%)`	:arrow_down:
src/server/async/socket_server.h	`100.00% <ø> (ø)`
src/server/memory/gpu/gpuallocator.cc	`0.00% <0.00%> (ø)`
src/server/memory/memory.h	`100.00% <ø> (ø)`
src/server/server/vineyard_server.h	`100.00% <ø> (ø)`
... and 7 more

Aug 25 '22 08:08 codecov[bot]

v6d
v6d copied to clipboard

[WIP] Support GPU `bulkstore` in vineyardd

What do these changes do?

Design Brief

Detailed Design

Related issue number

Codecov Report

v6d v6d copied to clipboard

[WIP] Support GPU `bulkstore` in vineyardd

What do these changes do?

Design Brief

Detailed Design

Related issue number

Codecov Report

v6d
v6d copied to clipboard