dstack
dstack copied to clipboard
[Docs] Recipe for model conversion: HF -> TRT-LLM
Add documentation demonstrating how to use Task to convert Hugging Face models to TensorRT-LLM engines for high-throughput inference
Thanks a lot, @quocdat-le-insacvl but this may only make sense with the example of deploying the converted model via the Triton server.
For now, I'm closing the PR. I hope that's OK. Ideally, we can re-open it if we figure out how to do it end-to-end, including deployment.