wanzhenchn issues

Results 8 issues of


wanzhenchn

S-LoRA with pytorch backend is very slow

### Motivation I using the scripts to **benchmark s-lora** with lmdeploy 0.2.6 on 2*A30. Firstly I only benchmark the base model lama2-13b-hf, the performance of pytorch backend **is obviously lower**...

欢迎各位补充相关学校官方发布的VIS下载链接

If you have great suggestions, welcome to share your brilliant idea.

程序只能输出链家展示的所有页面数据，对于未展示的数据似乎无法查询到？

假设北京二手房共找到 34539 套朝阳二手房，但是页面只展示了 100 页，每页 30 条记录，程序可以输出 100*30=3000 条记录，对于剩下的 34539 -3000 = 31539 条数据，似乎没法获取？

[Feature] Support W4A8KV4 Quantization(QServe/QoQ)

### Motivation This library https://github.com/mit-han-lab/qserve introduces W4A8KV4 Quantization method, called (https://arxiv.org/abs/2405.04532) as QoQ in the paper, which **delivers performance gains in large-batch** compared to other method (like awq-w4a16). > Quantization...

INT4 quantization only delievers 20%~35% faster inference performance than FP16 for the LLaMA-13b on A100

INT4 quantization only delievers **20%~35%** faster inference performance than FP16 for the LLaMA-13b on single A100 80GB PCIe with batch size 1, 2, 4, 8, 16 for prefill_length, decode length...

wanzhenchn