Codecov Report

Merging #9571 (4afef37) into master (d50f9ee) will decrease coverage by 0.02%. The diff coverage is 84.37%.

@@             Coverage Diff              @@
##             master    #9571      +/-   ##
============================================
- Coverage     70.40%   70.38%   -0.03%     
+ Complexity     5012     4932      -80     
============================================
  Files          1973     1973              
  Lines        105815   105822       +7     
  Branches      16032    16033       +1     
============================================
- Hits          74503    74478      -25     
- Misses        26131    26162      +31     
- Partials       5181     5182       +1

Flag	Coverage Δ
integration1	`25.10% <9.37%> (-0.04%)`	:arrow_down:
integration2	`24.87% <0.00%> (+0.19%)`	:arrow_up:
unittests1	`67.83% <85.71%> (-0.01%)`	:arrow_down:
unittests2	`15.81% <75.00%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...rg/apache/pinot/query/service/QueryDispatcher.java	`84.37% <50.00%> (-0.32%)`	:arrow_down:
...requesthandler/MultiStageBrokerRequestHandler.java	`59.45% <75.00%> (+0.30%)`	:arrow_up:
...va/org/apache/pinot/query/runtime/QueryRunner.java	`86.74% <100.00%> (+0.16%)`	:arrow_up:
...query/runtime/operator/MailboxReceiveOperator.java	`91.52% <100.00%> (-0.15%)`	:arrow_down:
.../pinot/query/runtime/plan/PhysicalPlanVisitor.java	`96.87% <100.00%> (ø)`
...e/pinot/query/runtime/plan/PlanRequestContext.java	`93.75% <100.00%> (+0.89%)`	:arrow_up:
...t/query/runtime/plan/ServerRequestPlanVisitor.java	`79.20% <100.00%> (+0.63%)`	:arrow_up:
.../runtime/plan/server/ServerPlanRequestContext.java	`100.00% <100.00%> (ø)`
...va/org/apache/pinot/query/service/QueryServer.java	`76.08% <100.00%> (+0.53%)`	:arrow_up:
...ntroller/helix/core/minion/TaskMetricsEmitter.java	`34.88% <0.00%> (-51.17%)`	:arrow_down:
... and 31 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Oct 12 '22 00:10 codecov-commenter

Correct me if I am wrong, the queryTimeoutMs in queryOption only applies to query server. And this one applies to broker?

So the query one can be tuned per query and this one is tuned per table config? I feel this is a little weird..

Oct 13 '22 18:10 61yao

@61yao unfortunately there's no direct equivalence. the currently query timeout logic is:

query timeout can be overwritten in this order

user config inside the query payload
table config
cluster config
cluster default (10sec)

query timeout is being used since the beginning of the query planning

only the remainder part of the timeout (e.g. total - plan finished duration ) is send to server as the server timeout

from a user perspective I think the user-config override should still apply as the top level (as from user's view, a query runs for at maximum X many seconds) however

there's no direct equivalence of table config override

consider a join of 2 tables, which table's override should we apply?
do we apply on per stage or per query?

if user is specifying for the entire query, how do we split between stages

here is my proposal

user-config override should always be considered as the total time (maybe minus planning time, i will explain later)
- with this in mind, all the current timeout configuration during query should also be considered as total time
all stages have a fixed timeout at an absolutely ts value = sys.currentTimeNano() + desiredTimeoutNano
- if any stage hits this timeout, they will all agree to terminate.

Oct 13 '22 19:10 walterddr

@61yao unfortunately there's no direct equivalence. the currently query timeout logic is:

query timeout can be overwritten in this order

user config inside the query payload

table config

cluster config

cluster default (10sec)

query timeout is being used since the beginning of the query planning

only the remainder part of the timeout (e.g. total - plan finished duration ) is send to server as the server timeout

from a user perspective I think the user-config override should still apply as the top level (as from user's view, a query runs for at maximum X many seconds) however

there's no direct equivalence of table config override

consider a join of 2 tables, which table's override should we apply?

do we apply on per stage or per query?

if user is specifying for the entire query, how do we split between stages

here is my proposal

user-config override should always be considered as the total time (maybe minus planning time, i will explain later)

with this in mind, all the current timeout configuration during query should also be considered as total time

all stages have a fixed timeout at an absolutely ts value = sys.currentTimeNano() + desiredTimeoutNano

if any stage hits this timeout, they will all agree to terminate.

Maybe after the proposal is done, we can document this somewhere? Let me file an issue to track this.

Oct 13 '22 20:10 61yao

Do we agree that this is the end state ?

User specifies a query timeOut T which from user's perspective is the end to end query timeOut.

If Pinot does not finish processing the query within T, it should terminate execution (and ideally not leave any lingering execution resources)

In the current engine, T can be specified at Instance level / cluster level (broker config) or query option level or table level. The last one probably does not make sense for a multi table query in multi-stage engine.

Broker uses T depending on where it is coming from, spends some time T' in query planning etc and sends the remaining (T - T') to worker nodes when dispatching the plan. This is what is done in the current engine.

For the multi-stage engine as well, regardless of how many stages the plan is going to be executed in, X = (T - T') is the max time subsequent stages have.

If for example leaf stage itself takes more time than X, then it should terminate / interrupt threads and downsteam stages (parents) should detect that timeOut has expired, mailbox should not expect anything.

So, the entire operator pipeline / chain gets terminated if X gets exhausted anywhere

Oct 14 '22 20:10 siddharthteotia

+1 on @siddharthteotia 's comment.

let's do the following

ignore the table level config for multi-stage (at least for now)
the rest is exactly the same, make sure T is the e2e timeout by setting an absolute "must-finish-or-timeout" epoch and all process should check against

Oct 14 '22 22:10 walterddr

readjusted the PR. PTAL

Nov 30 '22 00:11 walterddr

pinot
pinot copied to clipboard

[multi-stage] allow configurable timeout

Codecov Report

pinot pinot copied to clipboard

[multi-stage] allow configurable timeout

Codecov Report

pinot
pinot copied to clipboard