[Timings] Add the ability to log times for async and sync calls

Open dsikka opened this issue 2 years ago • 1 comments

Summary

Add the ability to time function calls
Will be enabled unless the --disable-log-stats cli arg is used for the server as the timer's init and average calculations are now all done within the StatLogger
Once enabled, all functions decorated with @log_time and @log_async_time will be timed and added to a list to track measurements for every server request made
Average time values are computed and printed to the cli after a time interval has passed (controlled by the StatLogger)
Measurements are cleared after the average is calculated

Remaining Questions:

Currently using the python logger to log the times to the cli; do we want to print instead?

Testing:

The following can now be used to enable time logging while the server is running:

python -m vllm.entrypoints.api_server --model neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50 --port 5000

For any case where we want to time arbitrary blocks of code, without the use of decorators, the following is an example of how the code can be updated:


from timings.utils import get_singleton_manager

with get_singleton_manager().time("some_name_to_track"):
       x = numpy.sum(...)

Mar 27 '24 19:03 dsikka

Can you include simple test or a starter code and have an example of how to access the timings please! I think if you can set max_tokens arg in vLLM to (maybe) fix the number of calls.

Mar 28 '24 19:03 horheynm