hermes
hermes copied to clipboard
Consumer profiling
This PR introduces the ability to enable subscription profiling for debugging and increased observability. How to enable profiling? Simply modify subscription:
curl PUT 'https://{hermes}/topics/{topic}/subscriptions/{subscription}' \
--header 'Content-Type: application/json' \
--data '{
"profilingEnabled": true
}'
There are 3 types of run:
- Empty run, which means there were no messages to process
Flushing measurements for subscription Topic$subscription and EMPTY run:
StopWatch '': 63.228 milliseconds
----------------------------------------
Milliseconds % Task name
----------------------------------------
17.531625 28% signalsAndSemaphoreAcquire
45.361917 72% messageReceiver.next
00.334458 01% messageConverter.convert
partialMeasurements:
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 17.134792 milliseconds
--------------------------------------------------------
Milliseconds % Task name
--------------------------------------------------------
17.134792 100% signalsInterrupt.run
- Retried, which means message sending failed and message will be retried a) After first attempt
Flushing measurements for subscription Topic$subscription and RETRIED run:
StopWatch '': 38.243167 milliseconds
----------------------------------------
Milliseconds % Task name
----------------------------------------
00.012375 00% signalsAndSemaphoreAcquire
07.452583 19% messageReceiver.next
00.020709 00% messageConverter.convert
00.055333 00% offsetQueue.offerInflightOffset
03.209792 08% trackers.logInflight
01.685542 04% acquireRateLimiter
21.977458 57% messageSender.send
03.829375 10% handlers
partialMeasurements:
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 0.005583 milliseconds
-------------------------------------------------------
Milliseconds % Task name
-------------------------------------------------------
0.005583 100% signalsInterrupt.run
b) Another attempts
Flushing measurements for subscription Topic$subscription and RETRIED run:
StopWatch '': 2556.253125 milliseconds
----------------------------------------
Milliseconds % Task name
----------------------------------------
1005.374625 39% schedule.resend
1532.560209 60% acquireRateLimiter
0016.378125 01% messageSender.send
0001.940166 00% handlers
retryDelayMillis 1000
- Processed, which means message was successfully sent a) After retry
Flushing measurements for subscription Topic$subscription and PROCESSED run:
StopWatch '': 2556.253125 milliseconds
----------------------------------------
Milliseconds % Task name
----------------------------------------
1005.374625 39% schedule.resend
1532.560209 60% acquireRateLimiter
0016.378125 01% messageSender.send
0001.940166 00% handlers
retryDelayMillis 1000
b) After first attempt to send a message which was successful or discarded
Flushing measurements for subscription Topic$subscription and PROCESSED run:
StopWatch '': 23.785336 milliseconds
----------------------------------------
Milliseconds % Task name
----------------------------------------
00.010709 00% signalsAndSemaphoreAcquire
05.867667 25% messageReceiver.next
00.01225 00% messageConverter.convert
00.041459 00% offsetQueue.offerInflightOffset
01.491584 06% trackers.logInflight
01.058667 04% acquireRateLimiter
13.81425 58% messageSender.send
01.48875 06% handlers
partialMeasurements:
signalsInterrupt.run = StopWatch 'signalsInterrupt.run': 0.005 milliseconds
----------------------------------------------------
Milliseconds % Task name
----------------------------------------------------
0.005 100% signalsInterrupt.run
If profiling is enabled then for high volume subscriptions we would logs thousands of logs per second - perhaps we should have a configurable latency threshold above which we would log messages, i.e. if send was slower than X ms.