TensorRT-LLM
TensorRT-LLM copied to clipboard
feat: Add canary recipe support canary-1b and canary-1b flash with new prompt format
Add support for NeMo's conformer encoder-transformer decoder models (canary-1b and canary-1b flash)
The encoder is a fastconformer encoder. Initial attempts at implementing it using TRT-LLM layers resulted in poorer perf than onnx->trt.
I'm working on my own attempts of optimizing TRT versions of Conformer models, if you can share your initial attempts I can help and contribute back my results, I have a few accepted PRs here and in ModelOPT repo
I'm working on my own attempts of optimizing TRT versions of Conformer models, if you can share your initial attempts I can help and contribute back my results, I have a few accepted PRs here and in ModelOPT repo
Sure but this isn't a blocker for merging this for now.
@anand-nv Hi, TRT-LLM has already moved its development to github for now. Can you rebase your MR based on the latest main branch to prepare a fresh MR?
Thanks June
@juney-nvidia - rebase done
@juney-nvidia @Shixiaowei02 Can this be reviewed and merged?
Hi @anand-nv, you've done a great job! Do you think this MR will be merged any time soon?