cortex.cpp
cortex.cpp copied to clipboard
Add Multi-GPU Support for LlamaCpp Engine
Add Multi-GPU Support for LlamaCpp Engine
Description
We need to implement multi-GPU support for our LlamaCpp wrapper engine to improve performance and allow users to utilize multiple GPUs effectively.
Goals
- Allow users to choose which available GPUs to use for running the engine
- Implement load balancing across selected GPUs
- Maintain compatibility with single-GPU setups
Proposed Implementation
- Detect available GPUs on the system
- Add a configuration option for users to specify which GPUs to use
- Modify the wrapper engine to distribute workload across selected GPUs
Acceptance Criteria
- [ ] Users can specify which GPUs to use via configuration
- [ ] The engine correctly utilizes all selected GPUs
Additional Considerations
- Ensure proper error handling for scenarios where specified GPUs are unavailable
- Will we add this feature to model.yml for model management?
- Is this feature works for both CLI and API?