Add segment id parameter to segment metadata query
Description
This proposes to enhance the SegmentMetadataQuery by introducing a new optional parameter: segmentIds. This parameter allows users to query metadata for specific segments directly by their segmentId, rather than relying solely on interval-based filtering.
Motivation
This feature will be particularly useful for use cases such as:
- Debugging or inspecting individual segments;
- Validating the state of a known segment after ingestion or compaction;
- Programmatic access in custom tooling where segment IDs are already known.
Proposed Changes
-
Query Definition Layer
- Extend
SegmentMetadataQueryto include aList<String> segmentIdsfield. - Ensure proper serialization/deserialization with Jackson.
- Update equality, hashCode, and toString logic accordingly.
- Extend
-
Query Runner
- Modify
SegmentMetadataQueryRunnerto evaluate and skip segments whosesegmentIdis not in the provided list.
- Modify
-
Query Planning / Timeline Resolution
- Update
CachingClusteredClient(on the Broker) to support filtering segments bysegmentIdbefore dispatching queries. - Introduce a utility to map
segmentIdtoSegmentDescriptor, or extendVersionedIntervalTimelineif appropriate.
- Update
-
Backward Compatibility
- The new parameter will be optional and non-intrusive: if not specified, current behavior is preserved.
-
Testing
- Add unit tests for query definition, runner logic, and broker-level filtering behavior.
- Extend integration tests to cover mixed queries with and without
segmentIds.
Impacted Classes
The following classes are expected to be modified as part of this change:
-
org.apache.druid.query.metadata.metadata.SegmentMetadataQuery -
org.apache.druid.query.metadata.metadata.SegmentMetadataQueryRunner -
org.apache.druid.client.CachingClusteredClient -
org.apache.druid.query.SegmentDescriptor -
org.apache.druid.timeline.VersionedIntervalTimeline(if necessary to locate segments by ID) -
org.apache.druid.segment.ReferenceCountingSegment(for ID exposure) -
org.apache.druid.query.QueryToolChest(for caching or context changes) -
org.apache.druid.query.QueryRunnerTestHelper(for test support)
Example Usage
Query part
{
"queryType": "segmentMetadata",
"dataSource": "sample_datasource",
"segmentIds": [
"sample_datasource_2025-12-01T00:00:00.000Z_2025-12-02T00:00:00.000Z_2025-12-02T00:00:00.000Z_v1"
]
}
Response part
[
{
"id": "sample_datasource_2025-12-01T00:00:00.000Z_2025-12-02T00:00:00.000Z_2025-12-02T00:00:00.000Z_v1",
"intervals": ["2025-12-01T00:00:00.000Z/2025-12-02T00:00:00.000Z"],
"columns": {
"__time": {
"type": "LONG",
"typeSignature": "LONG",
"hasMultipleValues": false,
"hasNulls": false,
"size": 800000,
"cardinality": null,
"errorMessage": null
},
"user_id": {
"type": "STRING",
"typeSignature": "STRING",
"hasMultipleValues": false,
"hasNulls": false,
"size": 2000000,
"cardinality": 135000,
"errorMessage": null
},
"event_type": {
"type": "STRING",
"typeSignature": "STRING",
"hasMultipleValues": false,
"hasNulls": true,
"size": 500000,
"cardinality": 25,
"errorMessage": null
},
"metric_clicks": {
"type": "FLOAT",
"typeSignature": "FLOAT",
"hasMultipleValues": false,
"hasNulls": false,
"size": 1000000,
"cardinality": null,
"errorMessage": null
}
},
"aggregators": {
"metric_clicks": {
"type": "floatSum",
"name": "metric_clicks",
"fieldName": "metric_clicks"
}
},
"queryGranularity": {
"type": "minute"
},
"size": 4500000,
"numRows": 1000000,
"rollup": false
}
]
Testing
Unit Tests
- Add tests in
SegmentMetadataQueryTestto validate correct behavior whensegmentIdsis provided or omitted. - Extend
SegmentMetadataQueryRunnerTestto ensure only the specified segments are queried. - Add test coverage for edge cases, such as empty or non-existent
segmentIds.
Integration Tests
- Update or extend
ITSegmentMetadataTestto include scenarios using the newsegmentIdsparameter. - Add new tests that:
- Query metadata for a single known segment.
- Query with multiple segment IDs across intervals.
- Query with a mix of valid and invalid segment IDs (expect partial results or error handling).
- Validate compatibility with existing query context parameters (e.g.,
toInclude,merge, etc.).
- Verify that the query returns accurate and expected results without performance regressions.
Alternatives Considered
And considered performing this filtering at the client side, but that requires unnecessarily querying irrelevant segments, which is inefficient for large datasources. Implementing it natively at the Broker and QueryRunner layers is more scalable and consistent.
Backward Compatibility
The introduction of the segmentIds parameter will be designed to be optional and will not break any existing functionality. If the segmentIds parameter is not provided in the query, the current behavior based on interval filtering will remain unchanged.
However, we recognize that this new feature might require certain modifications in existing systems or tooling, especially for users who rely on interval-based querying for segment metadata. To mitigate any potential compatibility issues:
-
Query Compatibility:
- If
segmentIdsis used alongsideintervals, the query will return metadata only for segments whosesegmentIdmatches the provided list, within the specified interval. - If no
segmentIdsare provided, the system will continue to use the interval-based filtering mechanism, ensuring seamless backward compatibility.
- If
-
Documentation and Communication:
- Documentation will be updated to highlight this new optional parameter, with examples for both use cases, one with and one without the
segmentIdsparameter. - Users who have been using segment metadata queries with interval-based filtering will not experience any changes unless they explicitly choose to use the
segmentIdsparameter.
- Documentation will be updated to highlight this new optional parameter, with examples for both use cases, one with and one without the
-
Feature Flagging:
- To ensure smooth rollout, this feature could be initially introduced behind a feature flag, allowing users to opt-in and test the new functionality before enabling it fully in production environments.
-
Fallback Mechanism:
- If a
segmentIddoes not exist (e.g., due to a typo or missing segment), the query will gracefully handle the error, either by returning an empty result for the invalidsegmentIdor providing an appropriate error message, depending on the desired behavior.
- If a
By implementing this optional parameter in a non-intrusive manner, the overall system remains compatible with existing workloads and users are given the flexibility to adopt the new feature at their discretion.
It is actually possible to do this today! Although, the way you do it is not documented. But maybe we should document it? It involves placing a list of segments in the intervals field of a query. This feature exists because when the Broker passes down your query to Historicals, it replaces the intervals with the list of segments that the specific Historical should be querying.
It looks like:
{
"queryType": "segmentMetadata",
"dataSource": "sample_datasource",
"intervals": {
"type": "segments",
"segments": [
{
"itvl": "2025-12-01T00:00:00.000Z/2025-12-02T00:00:00.000Z",
"ver": "2025-12-02T00:00:00.000Z",
"part": 1
}
]
}
}
It is indeed possible to do this with any query type.