v6d
v6d copied to clipboard
[WIP] Support GPU `bulkstore` in vineyardd
What do these changes do?
FEATURE-759, try to enable GPU memory object sharing. For now, give a naïve implementation for createGPUBuffer.
Design Brief
- Implement GPUBulkStore with the same functionality interface as BulkStore.
- Implement a GPU memory allocator, for now, naïve "allocate-on-request" is adopted.
- Client and Server add gpu-specific session support.
- Server sends the cudaIpcMemHandle to client in reply, client open the handle to get access to the shared GPU memory.
New class:
- GPUBulkStore: with the same interface as BulkStore.
- GPUBulkAllocator: with the same interface as BulkAllocator.
- GPUAllocator: GPU memory allocator, it is left for future work.
Main Related class:
- Client: add GPU-specific request method, like createGPUBuffer.
- VineyardServer, SocketConnection: add GPU-specific session support.
- Payload: add a representation bit is_gpu to distinguish GPU object from CPU object.
Detailed Design
- GPU Memory
- Implement GPUBulkStore and GPUBulkAllocator for managing objects on gpu.
- GPUAlloctor similar to dlmalloc is not implemented.
- Client
- Add new method, like createGPUBuffer, send GPU-specific request to server.
- Client will receive a cudaIpcMemHandle_t handle in reply and open the handle to access to the shared gpu memory .
- Server
- add gpu_bulk_store_ for managing gpu objects.
- add gpu-specific session support and response to client gpu-specific request.
- gpu object sharing is accomplished by getting a cudaIpcMemHandle handle and sending the handle to client in reply.
- Client/Server interaction
- add necessary protocols, like CreateGPUBufferRequest and associated functions.
Related issue number
Feature #759 cc @septicmk
Hi @CSWater, I have left some comments for you, I suggest you consider the following at an early stage:
- Make the GPU-feature optional.
- Allocate the GPU memory on-demand, building a custom GPU allocator is not in the plan.
- Disable the ColdObjectTracker, since this is another orthogonal under-construction feature.
- We still need the unit test but don't trigger it in CI since we do not have an online GPU environment.
Hi @CSWater, I have following suggestion:
- We should pushdown the
ENABLE_GPU
marcos in/gpu
to make the regular code path more clear. GPU-related methods are always callable but returnStatus::NotImplement()
when GPU feature is not toggled on. - To support heterogeneous objects (blobs are allocated in both GPU and CPU) we should merge
GPUBulkStore
andBulkStore
. You can add something likecreate_gpu
anddelete_gpu
in theBulkStore
. - Rebase the code and solve the conflict after you have finalized your code.
Codecov Report
Merging #876 (dfcc2a7) into main (d77c3e2) will decrease coverage by
1.72%
. The diff coverage is4.76%
.
:exclamation: Current head dfcc2a7 differs from pull request most recent head 420c28c. Consider uploading reports for the commit 420c28c to get more accurate results
@@ Coverage Diff @@
## main #876 +/- ##
==========================================
- Coverage 70.70% 68.98% -1.73%
==========================================
Files 77 80 +3
Lines 8030 8234 +204
==========================================
+ Hits 5678 5680 +2
- Misses 2352 2554 +202
Impacted Files | Coverage Δ | |
---|---|---|
src/client/client.cc | 65.68% <0.00%> (-2.28%) |
:arrow_down: |
src/client/client.h | 100.00% <ø> (ø) |
|
src/common/memory/gpu/unified_memory.cc | 0.00% <0.00%> (ø) |
|
src/common/memory/gpu/unified_memory.h | 0.00% <0.00%> (ø) |
|
src/common/util/protocols.cc | 70.10% <0.00%> (-4.85%) |
:arrow_down: |
src/server/async/socket_server.cc | 77.71% <0.00%> (-1.54%) |
:arrow_down: |
src/server/async/socket_server.h | 100.00% <ø> (ø) |
|
src/server/memory/gpu/gpuallocator.cc | 0.00% <0.00%> (ø) |
|
src/server/memory/memory.h | 100.00% <ø> (ø) |
|
src/server/server/vineyard_server.h | 100.00% <ø> (ø) |
|
... and 7 more |