druid icon indicating copy to clipboard operation
druid copied to clipboard

Support non time order in MSQ compaction

Open gargvishesh opened this issue 1 year ago • 0 comments

Description

https://github.com/apache/druid/pull/16849 added support for sorting segments with non-time columns. This PR extends that support to MSQ compaction. Specifically, if forceSegmentSortByTime is set in the data schema -- either in the user-supplied compaction config or in the inferred schema -- the following steps are taken:

  • Skip adding __time explicitly as the first column to the dimension schema since it already comes as part of the schema
  • Ensure column mappings propagate __time in the order specified by the schema
  • Set forceSegmentSortByTime in the MSQ context.

Also, the PR adds (missing) unit tests for verifying MSQ spec generated with nested and auto-type columns

This PR has:

  • [x] been self-reviewed.
    • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] a release note entry in the PR description.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [] added integration tests.
  • [x] been tested in a test Druid cluster.

gargvishesh avatar Oct 10 '24 04:10 gargvishesh