Yineng Zhang issues

Results 34 issues of


                                            Yineng Zhang

ScaleLLM inspiration

### Motivation Hi all. @lvhan028 @lzhangzz @grimoire @irexyc Recently I discovered an interesting project [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM/). Its positioning is similar to most currently open-source LLM Serving frameworks. It integrates libraries such...

[Feature] TurboMind support W8A8 or FP8 KV Cache

### Motivation We plan to add support for `W8A8` SmoothQuant or `FP8 KV Cache` on TurboMind. There is currently no clear decision on which one to prioritize first. We would...

[WIP] support Medusa

## Motivation As titled, support Medusa ## Modification ### finished - [x] 1、Medusa weights conversion - [x] 2、Medusa weights loading - [x] 3、Porting Medusa Heads code with LMDeploy components and...

feat: support LMDeploy backend

### Feature request @aarnphm @ssheng @parano Hi OpenLLM team, thank you for your exceptional work. Currently, OpenLLM supports two backends, vLLM and PyTorch, with good usability but there is still...

[Bug] Failed to open OpenAI Translator when clicked

### Please search before asking - [X] I searched in the [issues](https://github.com/yetone/openai-translator/issues) and found nothing similar. ### Please read README - [X] I have read the troubleshooting section in the...

bug

[Feature] A series of various optimization points

### Motivation When we use LMDeploy for Serving, although throughput is also a concern, **more emphasis is placed on throughput under latency constraints with different QPS**. This is a performance...

[Feature] FA3 supports chunked prefill, decode, paged kv cache, variable length timeline

as titled cc @tridao @jayhshah

Feature: Flash Attention 3

https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/ cc @yzh119

[Bug] Llama3 70B A100 PCIE TP4 slow speed

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. -...

[Bug] T4 not work

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [X] 2. The bug has not been fixed in the latest version. -...

bug

flashinfer