dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Docs] Recipe for model conversion: HF -> TRT-LLM

Open dat-lequoc opened this issue 1 year ago • 1 comments

Add documentation demonstrating how to use Task to convert Hugging Face models to TensorRT-LLM engines for high-throughput inference

dat-lequoc avatar Jun 30 '24 11:06 dat-lequoc

Thanks a lot, @quocdat-le-insacvl but this may only make sense with the example of deploying the converted model via the Triton server.

peterschmidt85 avatar Jul 01 '24 14:07 peterschmidt85

For now, I'm closing the PR. I hope that's OK. Ideally, we can re-open it if we figure out how to do it end-to-end, including deployment.

peterschmidt85 avatar Jul 11 '24 08:07 peterschmidt85