sdk-core
sdk-core copied to clipboard
[Feature Request] Investigate ways to determine if worker polling is healthy
Is your feature request related to a problem? Please describe.
There is currently no easy way to know if a worker's poll calls are failing. Users want to make a call on the worker to know whether it's healthy or backing off due to server failure.
Describe the solution you'd like
TBD. Options:
- Create metrics like
workflow_task_queue_poll_failed
andactivity_task_queue_poll_succeed
/activity_task_queue_poll_failure
and encourage checking those metrics- A bit hacky to ask users to do manual subtraction and state management
- These metrics have value anyways, we should probably add them.
long_request_failure
is not very detailed (but technically good enough if we exposed a way to create custom metric labels per client).
- Populate some kind of internal
std::sync::atomic::AtomicBool
for whether the last poll calls are successful for a worker (or client) and expose some kind of getter to check them - Support for general gRPC interceptors from lang through Rust could help advanced uses like this and others
- Some other on-poll-failed callback mechanism?
- Customize retry logic for workers so users can opt-in to eagerly failing workers a bit more aggressively