[Feature] support workers when using lite autoawq

Open zhyncs opened this issue 1 year ago • 1 comments

Hi all. @lvhan028 @lzhangzz @AllentDan When I want to use lite autoawq with Llama 3.1 405B Instruct, it takes a very long time.

python3 -m lmdeploy lite auto_awq Meta-Llama-3.1-405B-Instruct

May we support --workers similar to TensorRT LLM? Thanks.

ref https://nvidia.github.io/TensorRT-LLM/performance/perf-overview.html#engine-building

--workers affects the number of threads that build the engine file and does not necessarily need to match the TP size.

No response

No response

Jul 28 '24 16:07 zhyncs

For example, if there are currently 4 A100 or H100, lite autoawq can only utilize one of them, leaving the remaining 3 idle.

Jul 28 '24 16:07 zhyncs