lmdeploy
lmdeploy copied to clipboard
[Feature] support workers when using lite autoawq
Motivation
Hi all. @lvhan028 @lzhangzz @AllentDan When I want to use lite autoawq with Llama 3.1 405B Instruct, it takes a very long time.
python3 -m lmdeploy lite auto_awq Meta-Llama-3.1-405B-Instruct
May we support --workers similar to TensorRT LLM? Thanks.
ref https://nvidia.github.io/TensorRT-LLM/performance/perf-overview.html#engine-building
--workers affects the number of threads that build the engine file and does not necessarily need to match the TP size.
Related resources
No response
Additional context
No response
For example, if there are currently 4 A100 or H100, lite autoawq can only utilize one of them, leaving the remaining 3 idle.