pinot Allows multiple requests per server per request ID

Relates to https://github.com/apache/pinot/issues/10712

This PR proposes to remove the concept of separate "servers" for OFFLINE and REALTIME query handling. Instead, queries are uniquely identified based on the physical table that they target in the actual query (myTable_OFFLINE or myTable_REALTIME). The hashcode of the Thrift PinotQuery object is unique per-server, per-physical-table.

This change helps pave the way for Logical Table support by allowing a single broker request to more easily "fanout" into arbitrarily many requests issued to each required server.

There may be a few rough edges here and there, but I'd like to open this for feedback on the concept and current implementation.

Aug 03 '24 00:08 egalpin

Codecov Report

Attention: Patch coverage is 61.71617% with 116 lines in your changes missing coverage. Please review.

Project coverage is 63.79%. Comparing base (59551e4) to head (41ebbbb). Report is 1585 commits behind head on master.

Files with missing lines	Patch %	Lines
...sthandler/BaseSingleStageBrokerRequestHandler.java	50.00%	13 Missing and 9 partials :warning:
...pache/pinot/core/transport/AsyncQueryResponse.java	69.56%	17 Missing and 4 partials :warning:
...roker/requesthandler/GrpcBrokerRequestHandler.java	0.00%	12 Missing :warning:
...r/spark/common/reader/PinotServerDataFetcher.scala	0.00%	12 Missing :warning:
...thandler/SingleConnectionBrokerRequestHandler.java	0.00%	11 Missing :warning:
...rg/apache/pinot/core/transport/ServerChannels.java	47.05%	5 Missing and 4 partials :warning:
...a/org/apache/pinot/core/transport/QueryRouter.java	80.55%	5 Missing and 2 partials :warning:
...rg/apache/pinot/core/transport/ServerInstance.java	33.33%	3 Missing and 1 partial :warning:
...e/pinot/core/query/reduce/BrokerReduceService.java	86.95%	2 Missing and 1 partial :warning:
...e/pinot/core/query/request/ServerQueryRequest.java	66.66%	3 Missing :warning:
... and 7 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13742      +/-   ##
============================================
+ Coverage     61.75%   63.79%   +2.04%     
- Complexity      207     1611    +1404     
============================================
  Files          2436     2710     +274     
  Lines        133233   151272   +18039     
  Branches      20636    23348    +2712     
============================================
+ Hits          82274    96500   +14226     
- Misses        44911    47539    +2628     
- Partials       6048     7233    +1185

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	:arrow_up:
integration	`100.00% <ø> (+99.99%)`	:arrow_up:
integration1	`100.00% <ø> (+99.99%)`	:arrow_up:
integration2	`0.00% <ø> (ø)`
java-11	`63.75% <61.71%> (+2.04%)`	:arrow_up:
java-21	`63.67% <61.71%> (+2.05%)`	:arrow_up:
skip-bytebuffers-false	`63.76% <61.71%> (+2.01%)`	:arrow_up:
skip-bytebuffers-true	`63.66% <61.71%> (+35.93%)`	:arrow_up:
temurin	`63.79% <61.71%> (+2.04%)`	:arrow_up:
unittests	`63.78% <61.71%> (+2.04%)`	:arrow_up:
unittests1	`56.36% <73.30%> (+9.47%)`	:arrow_up:
unittests2	`34.04% <12.21%> (+6.31%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Aug 07 '24 21:08 codecov-commenter

@siddharthteotia @jackjlli @vvivekiyer I'd love to get some guidance on the regression suites. I believe I understand why the failures exist, because depending on the order of which components get upgraded first, brokers and servers may be incompatible across versions with this change as it is currently proposed.

If servers were guaranteed to be upgraded first, upgrades should be smooth. However, if brokers were upgraded first then server responses of data tables would be lacking the expected metadata field QUERY_HASH.

What's the preferred approach for this kind of change? It may be possible to create intentional throw-away code that would provide a compatibility bridge, but this would imply that getting to the final state would require multiple upgrades (possibly spanning multiple minor versions, which feels very drawn out).

Aug 13 '24 21:08 egalpin

@siddharthteotia @jackjlli @vvivekiyer Is there a specific order in which different pinot components must be restarted when performing an upgrade?

I can think of a smooth way to introduce these changes if servers are updated first (servers start returning additional and initially unused metadata with datatable responses), but it's much more challenging as written if brokers must be updated first. Thoughts?

Sep 03 '24 22:09 egalpin

@siddharthteotia @jackjlli @vvivekiyer Is there a specific order in which different pinot components must be restarted when performing an upgrade?

I can think of a smooth way to introduce these changes if servers are updated first (servers start returning additional and initially unused metadata with datatable responses), but it's much more challenging as written if brokers must be updated first. Thoughts?

The common way to roll out any new release is controller -> broker -> server -> minion. In this case, it's always preferred to roll out pinot-brokers before pinot-servers. You can check the logic in pinot_compatibility_tests.yml which mimics this rollout behavior. So it'd be good that broker has the logic to recognize the old and new signatures (i.e. making it backward compatible).

Sep 04 '24 20:09 jackjlli

From what I can tell from the logs, the failing tests are coming from pinot-controller. However I do believe that they are flaky tests as I'm able to see success e2e tests locally via mvn test -T 16 -pl 'pinot-controller':

Screenshot 2024-10-17 at 15 27 16

Oct 17 '24 22:10 egalpin

@siddharthteotia @jackjlli @vvivekiyer I recognize that this PR makes changes on a critical path so would it be mutually beneficial to schedule a walk-through of this review over video conference? I'd be happy to do so

Oct 22 '24 16:10 egalpin

Nov 13 '24 21:11 egalpin

I want to see what is the final desired way to handle multiple requests per request id. Do you plan to use a single request id on broker side and let server split it on a per table basis? I think that works, and actually opens up opportunity to do first level merge on the server side before sending the response to broker.

My concept is to have a singular request ID associated with all the server requests that are issued. Each request sent to servers would be sent in the same way that it is today, where each tables settings of configs would be respected. From the perspective of the server, it would just be a set of queries like any other, as if a query to each table had been made in quick succession. Then, using the shared request ID, the broker would be able to reduce all the server responses from each targeted table down to a singular result table. We could definitely use multiple request IDs and track them from the Broker side, I hadn't previously considered that.

The current approach, when in its final form, would also instantiate a distinct BrokerRequest per table associated with a logical table. That would be done because certain table settings might cause query re-writes, like enforcing approximation for distinct queries based on table-level settings. We could definitely get more intelligent about this and merge into a single server request any queries that have the same query contents (i.e. same re-write or no re-writes). Here's a rough example of how that might work (changes shown between this PR branch and another dev branch I have): https://github.com/egalpin/pinot/compare/egalpin/refactor-broker-request...egalpin:pinot:egalpin/refactor-broker-request-logical-table-expansion?expand=1

Nov 15 '24 22:11 egalpin

pinot pinot copied to clipboard

Allows multiple requests per server per request ID

Codecov Report

pinot
pinot copied to clipboard