lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Feature] support workers when using lite autoawq

Open zhyncs opened this issue 1 year ago • 1 comments

Motivation

Hi all. @lvhan028 @lzhangzz @AllentDan When I want to use lite autoawq with Llama 3.1 405B Instruct, it takes a very long time.

python3 -m lmdeploy lite auto_awq Meta-Llama-3.1-405B-Instruct

May we support --workers similar to TensorRT LLM? Thanks.

ref https://nvidia.github.io/TensorRT-LLM/performance/perf-overview.html#engine-building

--workers affects the number of threads that build the engine file and does not necessarily need to match the TP size.

Related resources

No response

Additional context

No response

zhyncs avatar Jul 28 '24 16:07 zhyncs

For example, if there are currently 4 A100 or H100, lite autoawq can only utilize one of them, leaving the remaining 3 idle.

zhyncs avatar Jul 28 '24 16:07 zhyncs