composer icon indicating copy to clipboard operation
composer copied to clipboard

Fix TRT-LLM Multigpu Compatibility

Open nik-mosaic opened this issue 1 year ago • 1 comments

[Wip] What does this PR do?

We need to use Composer to run our evaluation framework on TRT-LLM models. Unfortunately, this breaks in the Multi-GPU case. These fixes allow Composer to run N copies in parallel and feed data in a way that works with multi-gpu TRT-LLM models. Essentially, these changes are (a) not initializing dist and (b) fixing some race conditions related to data loading.

TODO:

  • Replace commented out code with parameters we can pass in.

nik-mosaic avatar Jan 11 '24 13:01 nik-mosaic

@nik-mosaic is this still relevant

mvpatel2000 avatar Mar 19 '24 20:03 mvpatel2000