composer
composer copied to clipboard
Fix TRT-LLM Multigpu Compatibility
[Wip] What does this PR do?
We need to use Composer to run our evaluation framework on TRT-LLM models. Unfortunately, this breaks in the Multi-GPU case. These fixes allow Composer to run N copies in parallel and feed data in a way that works with multi-gpu TRT-LLM models. Essentially, these changes are (a) not initializing dist and (b) fixing some race conditions related to data loading.
TODO:
- Replace commented out code with parameters we can pass in.
@nik-mosaic is this still relevant