zxy

Results 7 issues of zxy

Thank you for your excellent work! Currently, I am trying to reproduce KVQaunt but have encountered some errors. Your assistance with this matter would be appreciated. ### 1. Reproduce the...

Thanks for your excellent work! As stated in the paper Table 1: "Performance comparison of SnapKV and H2O across various LLMs on LongBench", could you provide the scripts/codes for reproducing...

## Objective Align with [vLLM v1 metrics system](https://docs.vllm.ai/en/latest/design/v1/metrics.html) and beyond. We also refer to [SGLang monitoring](https://github.com/sgl-project/sglang/blob/1ab14c4c5c67d0577451764f4a77d685a7dc2db4/examples/monitoring/README.md). ## TODO - [x] Change `time.perf_counter()` - [ ] Abstract output processing outside of...

WIP

Just as the title goes.

WIP
documentation

## Modifications 1. Expose deepep env var Default deepep buffer num sms will raise the following errors on H200 multi-nodes. Therefore, we expose this environment variable to users for configuration....

**Not ready to be merged or fully reviewed yet.** However, since we have already implemented the essential building blocks and passed my naive single request test, I propose a draft...

## Usage 1. quantize ```bash lmdeploy lite blocked_fp8 ${model_path} --work-dir ${quantized_model_path} --quant-dtype fp8 ``` 2. test case NOTE: We can use either `pytorch` or `turbomind` backend for FP8 inference. Here...

enhancement