zhouheyun comments

Repositories
Issues
Comments

Results 1 comments of


                                            zhouheyun

Reproduce inference benchmark mentioned in the paper

> Our open-source code ([vllm-project/vllm#4650](https://github.com/vllm-project/vllm/pull/4650)) is not the inference code used in the API platform, so it cannot achieve the throughput speed mentioned in the paper. @zhouheyun What‘s the average...