Ethan
Results
2
issues of
Ethan
Support disable metal buffer cache to prevent performance degradation caused by large memory caching
## Proposed changes Reason for this PR: 1. When running LLMs inference on devices with smaller memory, such as 8G, the speed noticeably decreases after more and more tokens, and...
## Why are these changes needed? Add multi-lora support for vllm_worker, this feature has been supported in vllm v0.3.2. This PR enables this capability in vllm_worker. 1. Add a new...