Add multi stream ingestion support
feature
Reference: https://github.com/apache/pinot/issues/13780 Design Doc
Please refer to design doc for details. TLDR:
- Add support to ingest from multiple source by a single table
- Use existing interface (TableConfig) to define multiple streams
- Separate the partition id definition between Stream and Pinot segment
- Compatible with existing stream partition auto expansion logics
Feature tested on multiple Kafka topics with different decoder format. Due to resource limitations, not able to test other upstream source e2e.
Some TODOs:
- Validation and Limitation on multiple stream configs.
- Standardize the usage of StreamConfig object. e.g. some are only used to get non-topic related static metadata, should use other interface.
- Adding/removing stream support or sanity check.
Codecov Report
Attention: Patch coverage is 63.21839% with 64 lines in your changes missing coverage. Please review.
Project coverage is 64.03%. Comparing base (
59551e4) to head (d8f46da). Report is 1483 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #13790 +/- ##
============================================
+ Coverage 61.75% 64.03% +2.27%
- Complexity 207 1605 +1398
============================================
Files 2436 2703 +267
Lines 133233 149053 +15820
Branches 20636 22849 +2213
============================================
+ Hits 82274 95439 +13165
- Misses 44911 46620 +1709
- Partials 6048 6994 +946
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.95% <63.21%> (+2.24%) |
:arrow_up: |
| java-21 | 63.92% <63.21%> (+2.30%) |
:arrow_up: |
| skip-bytebuffers-false | 63.97% <63.21%> (+2.22%) |
:arrow_up: |
| skip-bytebuffers-true | 63.90% <63.21%> (+36.17%) |
:arrow_up: |
| temurin | 64.03% <63.21%> (+2.27%) |
:arrow_up: |
| unittests | 64.02% <63.21%> (+2.27%) |
:arrow_up: |
| unittests1 | 56.29% <33.70%> (+9.40%) |
:arrow_up: |
| unittests2 | 34.47% <52.29%> (+6.74%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Add the production running report: The feature has been running in Uber production environment with PBs of data for months. There are hundreds of Pinot tables created. One table can be created with 20-30+ topics ingested with no issues. The overall ingestion and query performance is also competitive with the common single topic ingestions.
Notes: the feature is running with Kafka's multiple topics ingestion. We do not have resources to run it with other or multiple type of streams.
Thanks @itschrispeck for identifying a edge case issue and proposing the fix in the commit https://github.com/apache/pinot/pull/13790/commits/6bd1307c9249ce73232865cfe4edcc967b058111 to address the missing segments issue if cannot fetch partition metadata from the stream.
@Jackie-Jiang could you pls review again and see if we still have blockers before merging? Thanks
Thanks again for contributing this feature. Is there a user doc associated with this feature
Thanks again for contributing this feature. Is there a user doc associated with this feature
@kishoreg thanks for bringing this up. After the PR merged, I would update the pinot-doc with the feature and its usage. In general, users could use the exact same way to define the table config with existing interfaces. I would provide an example in PR description.