scuda
scuda copied to clipboard
Is it possible to support vGPU?
like https://github.com/Project-HAMi/HAMi-core
Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html
The annotations can be found here: https://github.com/kevmo314/scuda/blob/main/codegen/annotations.h
I don't know of a good test case for vGPU's though, ideally a very minimal binary that runs through the APIs would make verification easier.
Do you have any more complex cases that can run? Currently, I can only execute the simplest nvidia-smi command.
build image
docker build . -f Dockerfile.build -t scuda-builder-12.6.0 \
--build-arg CUDA_VERSION=12.6.0 \
--build-arg DISTRO_VERSION=22.04 \
--build-arg OS_DISTRO=ubuntu \
--build-arg CUDNN_TAG=cudnn
create docker network
docker network create scuda
start server
docker run -it --rm --gpus=all -p 14833:14833 --name scuda-server --network scuda scuda-builder-12.6.0 /bin/bash -c "./local.sh server"
start client
docker run -it --rm --name scuda-client --network scuda scuda-builder-12.6.0 /bin/bash
test nvidia-smi
docker cp $(which nvidia-smi) scuda-client:/home/nvidia-smi
docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so ./nvidia-smi"
>Segfault handler installed.
>Wed Jan 15 01:48:43 2025
>+-----------------------------------------------------------------------------------------+
>| NVIDIA-SMI 560.27 Driver Version: 560.70 CUDA Version: 12.6 |
>|-----------------------------------------+------------------------+----------------------+
>| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
>| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
>| | | MIG M. |
>|=========================================+========================+======================|
>| 0 Quadro P2000 On | 00000000:01:00.0 Off | N/A |
>| 44% 29C P8 5W / 75W | Uninitialized | 0% Default |
>| | | N/A |
>+-----------------------------------------+------------------------+----------------------+
>
>+-----------------------------------------------------------------------------------------+
>| Processes: |
>| GPU GI CI PID Type Process name GPU Memory |
>| ID ID Usage |
>|=========================================================================================|
>| No running processes found |
>+-----------------------------------------------------------------------------------------+
test cuda api (aborted)
(base) ➜ ~ docker exec -it scuda-client /bin/bash -c "nvcc test/cublas_unified.cu -g -o cublas_unified -lcublas -L/usr/
local/cuda/lib64"
(base) ➜ ~ docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so cuda-gdb ./cublas_unified"
NVIDIA (R) cuda-gdb 12.6
Portions Copyright (C) 2007-2024 NVIDIA Corporation
Based on GNU gdb 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This CUDA-GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/cuda-gdb>.
Find the CUDA-GDB manual and other documentation resources online at:
<https://docs.nvidia.com/cuda/cuda-gdb/index.html>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./cublas_unified...
(cuda-gdb) run
Starting program: /home/cublas_unified
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
from ./libscuda_12.6.so
(cuda-gdb) bt
#0 0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
from ./libscuda_12.6.so
#1 0x00007ffff7f85170 in std::__detail::_Hash_code_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::__detail::_Select1st, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_bucket_index(unsigned long, unsigned long) const () from ./libscuda_12.6.so
#2 0x00007ffff7f84e81 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_bucket_index(unsigned long) const () from ./libscuda_12.6.so
#3 0x00007ffff7f84b87 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from ./libscuda_12.6.so
#4 0x00007ffff7f8450f in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> > >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from ./libscuda_12.6.so
#5 0x00007ffff7f74c26 in get_function_pointer(char const*) () from ./libscuda_12.6.so
#6 0x00007ffff7eb8b94 in dlsym () from ./libscuda_12.6.so
--Type <RET> for more, q to quit, c to continue without paging--
#7 0x00007fffd267c356 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
#8 0x00007ffff7fc947e in ?? () from /lib64/ld-linux-x86-64.so.2
#9 0x00007ffff7fc9568 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7fe32ca in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffe3ca in ?? ()
#13 0x0000000000000000 in ?? ()
You can find our test suite here which covers the cases that currently work and we've verified that they work: https://github.com/kevmo314/scuda/blob/main/local.sh#L24
We are still working through all the APIs though, admittedly this repo gained visibility much faster than we have been able to wire them all up together :)
Most of the APIs only require some tweaks in the annotations file, although getting used to knowing which tweaks need to be made is a bit of an art right now. Some improved debugging tools are also on the roadmap.
Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html
@kevmo314 What I'm referring to with vGPU is not the NVIDIA official MIG device. It's a technology that similarly use the Linux PRELOAD for its implementation. This technology is realized by the project found at Project-HAMi/HAMi-core. Also, it's very useful for GPU pooling in data centers.
HAMi-core usercase:
export LD_PRELOAD=./libvgpu.so
export CUDA_DEVICE_MEMORY_LIMIT=1g
export CUDA_DEVICE_SM_LIMIT=50
nvidia-smi
>| 44% 29C P8 5W / 75W| 0 MiB / 1024 MiB | 0% Default |
@Fruneng 我也有这方面的需求,我在考虑如何将 Scuda 与 HAMi-core 进行集成,使得 gpu 具备池化的能力。
I also have needs in this regard. I am thinking about how to integrate Scuda with HAMi-core so that the gpu has pooling capabilities.
@silenceli 太好了 我们可以讨论一下如何实现