ServerlessLLM
ServerlessLLM copied to clipboard
[Feature Request] Add shared pinned memory pool for offloading enabled frameworks
Prerequisites
- [x] I have searched existing issues and reviewed documentation.
Problem Description
Current sllm store only shares parameter using GPU handle, it would be more beneficial if the pin memory pool can be shared to support offloading enabled applications.
Proposed Solution
- Add new memory pool using pinned and shared memory allocator
- Add interface to get CPU handle, i.e., shm_name
Alternatives Considered
No response
Additional Context
No response
Importance
Nice to have
Usage Statistics (Optional)
No response