truss-examples icon indicating copy to clipboard operation
truss-examples copied to clipboard

Profiling async TRT-LLM

Open aspctu opened this issue 2 years ago • 0 comments

This PR adds a Truss that is to be used to profile the effects of an async Truss Server and the Triton Server.

This implementation is not optimized for production with heavy use of logging and time-stamping.

A couple of gotchas worth noting:

  • In the Triton GRPC client we use (non-async version), the GRPC stream processor is spawned via a new thread. This processor takes a callback and invokes this callback in this thread (source here)
  • The callback we've defined, that gets used as the GRPC stream processor mentioned above, writes responses from Triton to the associated queue for the corresponding request. Because we use an asyncio.Queue, this callback is async and therefore requires an event loop to be present to execute our callback. However, as noted in the point above, the environment the callback is executed in does not have an event loop because it is executed in a new thread. Hence, the logic around here).
  • The environment also does not await our async callback hence logic here.

aspctu avatar Dec 02 '23 02:12 aspctu