Ethan issues

Repositories
Issues
Comments

Results 2 issues of


                                            Ethan

Support disable metal buffer cache to prevent performance degradation caused by large memory caching

## Proposed changes Reason for this PR: 1. When running LLMs inference on devices with smaller memory, such as 8G, the speed noticeably decreases after more and more tokens, and...

vllm_worker add multi-lora support

## Why are these changes needed? Add multi-lora support for vllm_worker, this feature has been supported in vllm v0.3.2. This PR enables this capability in vllm_worker. 1. Add a new...