runner
runner copied to clipboard
Improve reliability for waiting job to start before running steps.
- When update timeline record failed, instead of waiting for another update for the same record, we will auto retry the update with a short back-off.
- Collect how long does it normally take for the first timeline record get updated.
- Allow configure how long should we wait for the first job record updates to finish.
The change will be controlled by 3 feature flags:
- DistributedTask.EnableJobRecordUpdatedTelemetry
- DistributedTask.EnableRecordUpdateAutoRetry
- DistributedTask.FirstJobRecordUpdateWaitTimeInSeconds