Li Zhang comments

Results 72 comments of


                                            Li Zhang

trafficstars

[Bug] V100使用turbomind推理AWQ的Qwen2-72b-Instruct会出现奇怪的推理结果

V100 AWQ/GPTQ 刚在 #2090 支持，还没发版

[Bug] V100使用turbomind推理AWQ的Qwen2-72b-Instruct会出现奇怪的推理结果

可以先试试 nightly build https://github.com/zhyncs/lmdeploy-build/releases/tag/b28a1d0

[Feature] support qqq(w4a8) for lmdeploy

> And the difference should not be significant on A100. I have roughly verified it using SGLang's Marlin AWQ and LMDeploy TurboMind's AWQ on Llama 3.1 8B Instruct, and their...

[Bug] Lmdeploy LLM Llama3在4090单卡和双卡上的推理结果不一致

TP 数量影响 Linear 层在 k 方向上的并发度，会造成累加顺序的不同。浮点加法不满足结合律，不同的累加顺序的结果会有细微的差别。

[Bug] Qwen/Qwen2-1.5B error: floating point exception

估计是 2080 Ti 不支持 bf16

Split token_embs and lm_head weights

We need to benchmark the ar/ag case on different systems (NVLink/PCIe) first. https://github.com/NVIDIA/nccl-tests

Split token_embs and lm_head weights

@irexyc bus bandwidth of all-reduce and all-gather is computed differently.

[Bug] 运行glm4-9b的时候报错

May be fixed by #2201

support min_p sampling & do_sample setting

@irexyc is this still WIP？

[Feature] support Nemo

The input dim of `attention.output` should be computed as `head_num * head_dim`. The use of `hidden_units_` is a bug.