druid
druid copied to clipboard
Detect segment unavailability for queries in broker
Description
Broker maintains a timeline of segments which it builds overtime upon receiving updates from historical server and it uses this timeline to answer queries. Broker isn’t aware of what segments actually exists in the druid system. The result of this gap is incomplete query responses on some occasions.
With this feature the goal is to ensure, if a segment was queryable at one point in time, any future query over that segment would either include that segment or fail (unless it's replication factor is changed to 0).
Design
This change introduces a state for segment metadata called loaded, it is set to true, when a segment has been loaded onto some historical.
Broker polls the coordinator periodically to get all the used segments in the system, it merges all the loaded segment from this set into its timeline of segments. This timeline now consists of segments which are available on some historical server and which aren’t available on any server, this information helps the broker identify unavailable segments for the query.
This approach also ensures that any segment which has just been published but not loaded by any historical server doesn’t cause query failure.
Major changes
Coordinator changes
-
Add a new column
has_loadedin the druid_segments metadata table to represent if a segment has ever been loaded on a historical (changes inSQLMetadataConnector). -
Set the
has_loadedfield for a segment to true when the Coordinator receives notification from the historical. -
Update
DataSourcesSnapshotto maintain diff of the segments from the previous poll. -
Add coordinator API
MetadataResource#getChangedSegmentsto send either full snapshot or delta changes to the broker using the information present inDataSourcesSnapshot -
Changed classes:
CoordinatorServerView,SqlSegmentsMetadataManager,SqlSegmentsMetadataQuery,MetadataResource,DataSourcesSnapshot
Broker changes
-
MetadataSegmentViewpolls the coordinator to fetch the list of all used segments along with theirovershadowedandloadedstatus, on the very first poll it receives a full snapshot thereafter it receives delta updates. -
After the finish of every poll, notify BrokerServerView to update its timeline with all segments that have been loaded
- Remove segments that are not used anymore i.e. segments that are not present in the list polled from the coordinator
- Add segments that are used and loaded to the timeline, if they don’t already exist
-
While handling a query on the broker, lookup the segments required for the query from the timeline. If any of these segments is unavailable, throw an error.
-
Changed classes:
CachingClusteredClient,BrokerServerView,MetadataSegmentView
Segment lifecycle
The lifecycle of a segment s can be described as follows:
Creation: Segment metadata is published in the database at time t1.
Coordinator Polling: The Coordinator polls for new segments at time t2 but does not consider s loaded yet.
Broker Polling: The Broker polls the Coordinator at time t3 and finds segment s but doesn't add it to the timeline as it's not loaded.
Historical Loading: The segment s is loaded by a Historical process at time t4.
Coordinator Update: The Coordinator receives a callback from the Historical and marks the segment as loaded at time t5.
Segment Availability
Once marked as loaded, a segment is considered available for querying. However, there are potential scenarios that can impact segment availability:
Scenario 1: Broker Receives Segment Metadata Before Historical Callback
- The Broker adds
sto the timeline asqueryableat timet6. - Queries for
sat this point might fail as it's not fully loaded. - Once the Broker receives the callback from the Historical at time
t7, the segment becomes available for querying. - If all Historicals serving
sgo down, the segment remains in the timeline asqueryablebut becomes unavailable for querying.
Scenario 2: Broker Receives Historical Callback Before Segment Metadata
- The Broker adds
sto the timeline but doesn't mark it asqueryableat timet6. - Queries for
swill still be processed, but the segment might not be fully available. - Once the Broker receives metadata from the Coordinator at time
t7, the segment is marked asqueryable. - Similar to Scenario 1, if all Historicals serving
sgo down, the segment remains in the timeline asqueryablebut becomes unavailable for querying.
Synchronisation issues
Following synchronisation conditions could cause temporary query failure,
-
If the broker isn’t able to sync its timeline with the coordinator, this would cause broker to be unaware of recently removed segments from the historical
-
If the broker is behind historical server, sync with coordinator makes it aware of recently loaded segments but the broker would think that they are unavailable
- Broker lags behind Historical
- a. Coordinator tells broker that segment is used and loaded
- start failing if segment is unavailable immediately
- maybe also log historical sync times
- can't do anything but remove from timeline, stop serving that data
- a. Historical has removed a loaded segment
- fail
- current behaviour (feature off): can't do anything, start serving it?
- new behaviour: don't serve it unless coordinator says it is loaded
- If we don't do 2b, we might miss out the problem scenario explained later
- a. coordinator asked historical to remove but doesn't know unload is completed
- this would have same behaviour as 1b
- same behaviour as 2b
- nothing to do because historical wouldn't have made any decision, neither broker
- coordinator is source of truth
Problem scenario: Broker might serve something and then not serve it anymore without failing query.
- coordinator tells historical to load segment
- historical loads segment
- coordinator doesn't know segment is loaded
- broker knows segment is loaded and starts serving it
- segment disappears
- broker stops serving segment, but doesn't fail
- to avoid this, we have to do 2b
Pending items
- If
druid.sql.planner.detectUnavailableSegmentsis set thendruid.sql.planner.metadataSegmentCacheEnabletakes effect automatically. Verify if this behaviour is as expected or needs change. - MetadataResource#getChangedSegments should also return realtime segments if
CentralizedDatasourceSchemais enabled on the Coordinator. - Handle the scenario when the replication factor for a segment is updated to 0.
- Verify the potential synchronisation issues.
- Testing (unit tests, performance testing).
- Document new API, metrics.
Usage
druid.sql.planner.detectUnavailbleSegmentsneeds to be set in broker runtime propertiesunavailableSegmentsActionquery context can be set toalloworfail, accordingly the queries would fail, in either case the unavailable segments will be logged.
Upgrade considerations
TBA
Release note
TBA
Pending items
This PR has:
- [x] been self-reviewed.
- [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in licenses.yaml
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
How will this design work when there are used segments but have zero replication factor?
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
How will this design work when there are used segments but have zero replication factor?
If a segment is created with replication factor of 0, it will never be loaded onto any historical. Consequently, the segment wouldn't be marked as loaded and will not be added to the Broker's timeline. This behaviour is as expected.
If a segment's replication factor is changed to 0 after it has been loaded, the segment will remain in the Broker's timeline with queryable status, resulting in queries finding the segment to be unavailable.
This problem can be fixed by detecting the replication factor change in the Coordinator and thereafter send a change event to the Broker to remove the segment metadata from Broker's timeline.
Now consider the scenario where the segment's replication factor changes to non-zero, and the Broker receives this updated metadata from the Coordinator before the segment is reported as loaded by any Historical, queries in this period will find this segment to be unavailable. Is this behaviour correct? Or should the loaded field of the segment be dynamically adjusted when it is marked as cold?
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
Hi folks, is there interest in taking this PR across the finish line? If the overall design is still valid, I’m wondering if we could merge it in an experimental state documenting any caveats and limitations, and address those incrementally. I haven’t looked at the details yet, so not sure if the pending items are still relevant or if there are any major blockers. We have some use cases where this feature would be useful and I’m happy to help as needed.
cc: @cryptoe @kfaraz @gianm
@abhishekrb19 , IIRC, there wasn't anything fundamentally wrong with the approach here, but the interest diminished over time and we never got around to finishing it.
To merge this PR, we would need the following:
- Resolve all merge conflicts
- Remove changes that are not relevant anymore, given that Druid has evolved significantly over the last 2 years in the Coordinator-Broker sync and other areas
- Update PR description with the current approach (I am sure the description would have gone out of date now)
- Ensure that CI passes
- Ensure that all new features are behind a feature flag and none of the existing code paths are affected
- Add sufficient unit tests and at least one integration test for the new feature
- Perform some basic testing on a Druid cluster
- Get 2 approvals
Unfortunately, I currently don't have the bandwidth to take this up. I will be happy to do a review though if you or anyone else can tick off the above items.
@kfaraz, thanks for the response! Noted, we'll discuss and get back.
I can take this up cc @maytasm
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.