hermes icon indicating copy to clipboard operation
hermes copied to clipboard

Consumer profiling

Open szczygiel-m opened this issue 9 months ago • 1 comments

This PR introduces the ability to enable subscription profiling for debugging and increased observability. How to enable profiling? Simply modify subscription:

curl PUT 'https://{hermes}/topics/{topic}/subscriptions/{subscription}' \
--header 'Content-Type: application/json' \
--data '{
    "profilingEnabled": true
}'

There are 3 types of run:

  1. Empty run, which means there were no messages to process
Flushing measurements for subscription Topic$subscription and EMPTY run: 
 StopWatch '': 63.228 milliseconds
----------------------------------------
Milliseconds  %       Task name
----------------------------------------
17.531625     28%     signalsAndSemaphoreAcquire
45.361917     72%     messageReceiver.next
00.334458     01%     messageConverter.convert
 partialMeasurements: 
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 17.134792 milliseconds
--------------------------------------------------------
Milliseconds  %       Task name
--------------------------------------------------------
17.134792     100%    signalsInterrupt.run
  1. Retried, which means message sending failed and message will be retried a) After first attempt
Flushing measurements for subscription Topic$subscription and RETRIED run: 
 StopWatch '': 38.243167 milliseconds
----------------------------------------
Milliseconds  %       Task name
----------------------------------------
00.012375     00%     signalsAndSemaphoreAcquire
07.452583     19%     messageReceiver.next
00.020709     00%     messageConverter.convert
00.055333     00%     offsetQueue.offerInflightOffset
03.209792     08%     trackers.logInflight
01.685542     04%     acquireRateLimiter
21.977458     57%     messageSender.send
03.829375     10%     handlers
 partialMeasurements: 
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 0.005583 milliseconds
-------------------------------------------------------
Milliseconds  %       Task name
-------------------------------------------------------
0.005583      100%    signalsInterrupt.run

b) Another attempts

Flushing measurements for subscription Topic$subscription and RETRIED run: 
 StopWatch '': 2556.253125 milliseconds
----------------------------------------
Milliseconds  %       Task name
----------------------------------------
1005.374625   39%     schedule.resend
1532.560209   60%     acquireRateLimiter
0016.378125   01%     messageSender.send
0001.940166   00%     handlers
 retryDelayMillis 1000
  1. Processed, which means message was successfully sent a) After retry
Flushing measurements for subscription Topic$subscription and PROCESSED run: 
 StopWatch '': 2556.253125 milliseconds
----------------------------------------
Milliseconds  %       Task name
----------------------------------------
1005.374625   39%     schedule.resend
1532.560209   60%     acquireRateLimiter
0016.378125   01%     messageSender.send
0001.940166   00%     handlers
 retryDelayMillis 1000

b) After first attempt to send a message which was successful or discarded

Flushing measurements for subscription Topic$subscription and PROCESSED run: 
 StopWatch '': 23.785336 milliseconds
----------------------------------------
Milliseconds  %       Task name
----------------------------------------
00.010709     00%     signalsAndSemaphoreAcquire
05.867667     25%     messageReceiver.next
00.01225      00%     messageConverter.convert
00.041459     00%     offsetQueue.offerInflightOffset
01.491584     06%     trackers.logInflight
01.058667     04%     acquireRateLimiter
13.81425      58%     messageSender.send
01.48875      06%     handlers
 partialMeasurements: 
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 0.005 milliseconds
----------------------------------------------------
Milliseconds  %       Task name
----------------------------------------------------
0.005         100%    signalsInterrupt.run

szczygiel-m avatar May 04 '24 17:05 szczygiel-m

If profiling is enabled then for high volume subscriptions we would logs thousands of logs per second - perhaps we should have a configurable latency threshold above which we would log messages, i.e. if send was slower than X ms.

moscicky avatar May 20 '24 12:05 moscicky