druid
druid copied to clipboard
supervisor: Emit active/publishing task counts
Description
Adding this metric would help see how much of time a supervisor is spending to publish tasks, It is important to keep this time low because auto scaling would be skipped in during this period which could cause increased lag.
Release note
Adds new metrics: task/supervisor/active/count and task/supervisor/publishing/count.
Key changed/added classes in this PR
SeekableStreamSupervisor.java
This PR has:
- [x] been self-reviewed.
- [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in licenses.yaml
- [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
There has to be some docs changes. How are you going to infer the time in publishing tasks (btw what does supervisor publishing a task mean exactly)? And how do you keep that time low assuming you can find the time is high.
@adithyachakilam , leaving some suggestions here even though the PR is in draft right now.
how much of time a supervisor is spending to publish tasks
Could you please elaborate? What time are you referring to exactly? The supervisor is just a thread which wakes up and launches or kills tasks and updates some metadata.
If you want to capture the time a task spends in publishing segments,
then the correct metric for that would be something like ingest/publish/time (in the same vein as ingest/handoff/time and ingest/merge/time).
If you want to capture the number of tasks currently in publishing phase etc, then as @suneet-s has suggested, emitting the current phase/state of a streaming task in its heartbeat makes sense. But it would need some changes from the current approach:
- The
statusis not an intrinsic property of a task and must not be a part of theTaskinterface. You can inject the runner to build up the heartbeat map in theCliPeon.heartbeatDimensions()method. - For non-streaming tasks, instead of always emitting
UNKNOWN, do not emit any value for this dimension.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.