Xin Ji
Xin Ji
**Describe your question** Just wondering if there are some explanations on what some mertics means (like nvidia compute). For example, i'd like to know `MFMA`, `VALU` and so on, while...
### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...
Please take a look at the RFC [Batchllm](https://github.com/vllm-project/vllm/issues/12080) for more details. cc @WoosukKwon @comaniac for the next step.
### Motivation. This request is mainly for **offline inference scenarios** , based on the paper [BatchLLM](https://arxiv.org/abs/2412.03594) **TL; DR:** Currently, vllm performs implicit (or _just_in_time_) shared prefix identifying and metadata collecting,...