temporal
temporal copied to clipboard
Add metrics for nexus scheule to start latency
What changed?
Add metric for ScheduleToStart latency of nexus operations.
Why?
How did you test it?
Potential risks
Documentation
Is hotfix candidate?
I would put this on the outbound executor and use the task type tag so we can get this coverage for all outbound tasks:
https://github.com/temporalio/temporal/blob/117847107c19575c30d894ed7571d484146427ac/service/history/outbound_queue_active_task_executor.go#L70
In addition to that you could capture the start time before the task is executed and check if a NotFound error was returned - indicating that the task was skipped (to some degree). In that case I would put a label on the metric saying whether it was skipped or processed.
This will avoid issues with reprocessed tasks (which are expected) from messing with the schedule-to-start latency.
This PR was marked as stale. Please update or close it.