Li Zhang

Results 72 comments of Li Zhang
trafficstars

可以先试试 nightly build https://github.com/zhyncs/lmdeploy-build/releases/tag/b28a1d0

> And the difference should not be significant on A100. I have roughly verified it using SGLang's Marlin AWQ and LMDeploy TurboMind's AWQ on Llama 3.1 8B Instruct, and their...

TP 数量影响 Linear 层在 k 方向上的并发度,会造成累加顺序的不同。浮点加法不满足结合律,不同的累加顺序的结果会有细微的差别。

We need to benchmark the ar/ag case on different systems (NVLink/PCIe) first. https://github.com/NVIDIA/nccl-tests

@irexyc bus bandwidth of all-reduce and all-gather is computed differently.

The input dim of `attention.output` should be computed as `head_num * head_dim`. The use of `hidden_units_` is a bug.