druid icon indicating copy to clipboard operation
druid copied to clipboard

Add a segmentMorphFactory to MSQ Datasource Destination

Open adarshsanjeev opened this issue 1 year ago • 0 comments

Description

Introduces the concept of segmentMorphFactory to DataSourceMSQDestination and refactors some of the existing code around frames and semantic utils. A segmentMorphFactory is a way to introduce an alternate final stage during ingestion with MSQ. This opens up the possibility of modifying segments instead of generating new segments as a result of the query.

This PR should not have a functional impact, and is meant to be used for other features in the future.

Major changes include:

  • Refactor SemanticCreator and add SemanticUtils.
  • Add a segmentMorphFactory FrameProcessorFactory to MSQ.
  • Refactor SimpleQueryableIndex to take a metadata supplier. Refactor TmpFileSegmentWriteOutMedium to use a heap based WriteOutBytes before falling back to a temporary file.
  • Add FieldReader#makeRACColumn for use with row based frames.

This PR has:

  • [ ] been self-reviewed.
    • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] a release note entry in the PR description.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [ ] added integration tests.
  • [ ] been tested in a test Druid cluster.

adarshsanjeev avatar Jul 04 '24 06:07 adarshsanjeev