LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
An alternative to inference tool integration, LMDeploy obtains 14.42 qps performance on A100 for the llama 7b model according to this.
Hi, @merrymercy would you please kindly help review this PR?