Yineng Zhang

Results 34 issues of Yineng Zhang

### Motivation Hi all. @lvhan028 @lzhangzz @grimoire @irexyc Recently I discovered an interesting project [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM/). Its positioning is similar to most currently open-source LLM Serving frameworks. It integrates libraries such...

### Motivation We plan to add support for `W8A8` SmoothQuant or `FP8 KV Cache` on TurboMind. There is currently no clear decision on which one to prioritize first. We would...

## Motivation As titled, support Medusa ## Modification ### finished - [x] 1、Medusa weights conversion - [x] 2、Medusa weights loading - [x] 3、Porting Medusa Heads code with LMDeploy components and...

### Feature request @aarnphm @ssheng @parano Hi OpenLLM team, thank you for your exceptional work. Currently, OpenLLM supports two backends, vLLM and PyTorch, with good usability but there is still...

### Please search before asking - [X] I searched in the [issues](https://github.com/yetone/openai-translator/issues) and found nothing similar. ### Please read README - [X] I have read the troubleshooting section in the...

bug

### Motivation When we use LMDeploy for Serving, although throughput is also a concern, **more emphasis is placed on throughput under latency constraints with different QPS**. This is a performance...

https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/ cc @yzh119

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. -...

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. -...

bug
flashinfer