llm-foundry
llm-foundry copied to clipboard
Add TRT ComposerModel inference wrapper
**[WIP] Fix Batching
Adds a wrapper, similar to the OpenAI Wrappers in this PR, for TRT models.
The purpose is to be able to evaluate TRT models using our gauntlet, similar to how we evaluate HF/ComposerModels.
Results: https://docs.google.com/spreadsheets/d/1jKJki9QnB8TAt0hkNDQv_DhxMhWb10EtIR8worsLUwU/edit#gid=1219414201
What's the current issue with multi-gpu?