nm-vllm icon indicating copy to clipboard operation
nm-vllm copied to clipboard

[Timings] Add the ability to log times for async and sync calls

Open dsikka opened this issue 2 years ago • 1 comments

Summary

  • Add the ability to time function calls
  • Will be enabled unless the --disable-log-stats cli arg is used for the server as the timer's init and average calculations are now all done within the StatLogger
  • Once enabled, all functions decorated with @log_time and @log_async_time will be timed and added to a list to track measurements for every server request made
  • Average time values are computed and printed to the cli after a time interval has passed (controlled by the StatLogger)
  • Measurements are cleared after the average is calculated

Remaining Questions:

  • Currently using the python logger to log the times to the cli; do we want to print instead?

Testing:

The following can now be used to enable time logging while the server is running:

python -m vllm.entrypoints.api_server --model neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50 --port 5000 

For any case where we want to time arbitrary blocks of code, without the use of decorators, the following is an example of how the code can be updated:


from timings.utils import get_singleton_manager

with get_singleton_manager().time("some_name_to_track"):
       x = numpy.sum(...)

dsikka avatar Mar 27 '24 19:03 dsikka

Can you include simple test or a starter code and have an example of how to access the timings please! I think if you can set max_tokens arg in vLLM to (maybe) fix the number of calls.

horheynm avatar Mar 28 '24 19:03 horheynm