add gauges + rejections counter for mse thread usage and limit when the limiting feature is used
Currently, when you use the multi stage engine thread limiting feature, (via configs such as below)
pinot.server.query.executor.mse.max.execution.threads=1000
pinot.server.query.executor.mse.max.execution.threads.exceed.strategy=ERROR
you have no insights, through metrics, what's the current usage or how close you are to the limits. So without this, we are also unable to catch issues inadvance of when users might see them
This PR adds two gauges: MSE_EXECUTION_THREADS_MAX("threads", true, "Maximum allowed threads for multi-stage executor"), MSE_EXECUTION_THREADS_CURRENT("threads", true, "Current number of threads in use by multi-stage executor");
And a metric for task rejections MSE_EXECUTION_THREADS_TASK_REJECTIONS("tasks", true, "Number of tasks rejected by multi-stage executor due to thread limit being exceeded"),
Also updates the test using similiar code to existing tests
cc @yashmayya @Jackie-Jiang are you able to review this? Thanks
:x: 12 Tests Failed:
| Tests completed | Failed | Passed | Skipped |
|---|---|---|---|
| 12595 | 12 | 12583 | 48 |
View the top 3 failed test(s) by shortest run time
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select C_CITY, S_CITY, D_YEAR, sum(LO_REVENUE) as revenue from customer, lineorder, supplier, dates where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_ORDERDATE = D_DATEKEY and (C_CITY='UNITED KI1' or C_CITY='UNITED KI5') and (S_CITY='UNITED KI1' or S_CITY='UNITED KI5') and D_YEARMONTH = 'Jul1995' group by C_CITY, S_CITY, D_YEAR order by D_YEAR asc, revenue desc; ](10)Stack Traces | 0.084s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select C_CITY, S_CITY, D_YEAR, sum(LO_REVENUE) as revenue from customer, lineorder, supplier, dates where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_ORDERDATE = D_DATEKEY and C_NATION = 'UNITED STATES' and S_NATION = 'UNITED STATES' and D_YEAR >= 1992 and D_YEAR <= 1997 group by C_CITY, S_CITY, D_YEAR order by D_YEAR asc, revenue desc; ](8)Stack Traces | 0.092s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select D_YEAR, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) as profit from lineorder, dates, customer, supplier, part where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_PARTKEY = P_PARTKEY and LO_ORDERDATE = D_DATEKEY and C_REGION = 'AMERICA' and S_REGION = 'AMERICA' and (D_YEAR = 1997 or D_YEAR = 1998) and (P_MFGR = 'MFGR#1' or P_MFGR = 'MFGR#2') group by D_YEAR, S_NATION, P_CATEGORY order by D_YEAR, S_NATION, P_CATEGORY; ](12)Stack Traces | 0.092s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueriesStack Traces | 0.093s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select D_YEAR, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) as profit from lineorder, customer, supplier, part, dates where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_PARTKEY = P_PARTKEY and LO_ORDERDATE = D_DATEKEY and C_REGION = 'AMERICA' and S_REGION = 'AMERICA' and (P_MFGR = 'MFGR#1' or P_MFGR = 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION; ](11)Stack Traces | 0.097s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select C_CITY, S_CITY, D_YEAR, sum(LO_REVENUE) as revenue from customer, lineorder, supplier, dates where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_ORDERDATE = D_DATEKEY and (C_CITY='UNITED KI1' or C_CITY='UNITED KI5') and (S_CITY='UNITED KI1' or S_CITY='UNITED KI5') and D_YEAR >= 1992 and D_YEAR <= 1997 group by C_CITY, S_CITY, D_YEAR order by D_YEAR asc, revenue desc; ](9)Stack Traces | 0.102s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select D_YEAR, S_CITY, P_BRAND1, sum(LO_REVENUE - LO_SUPPLYCOST) as profit from lineorder, dates, customer, supplier, part where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_PARTKEY = P_PARTKEY and LO_ORDERDATE = D_DATEKEY and C_REGION = 'AMERICA' and S_NATION = 'UNITED STATES' and (D_YEAR = 1997 or D_YEAR = 1998) and P_CATEGORY = 'MFGR#14' group by D_YEAR, S_CITY, P_BRAND1 order by D_YEAR, S_CITY, P_BRAND1; ](13)Stack Traces | 0.112s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select C_NATION, S_NATION, D_YEAR, sum(LO_REVENUE) as revenue from customer, lineorder, supplier, dates where LO_CUSTKEY = C_CUSTKEY and LO_SUPPKEY = S_SUPPKEY and LO_ORDERDATE = D_DATEKEY and C_REGION = 'ASIA' and S_REGION = 'ASIA' and D_YEAR >= 1992 and D_YEAR <= 1997 group by C_NATION, S_NATION, D_YEAR order by D_YEAR asc, revenue desc; ](7)Stack Traces | 0.129s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select sum(CAST(LO_REVENUE AS DOUBLE)), D_YEAR, P_BRAND1 from lineorder, dates, part, supplier where LO_ORDERDATE = D_DATEKEY and LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and P_BRAND1 = 'MFGR#2221' and S_REGION = 'EUROPE' group by D_YEAR, P_BRAND1 order by D_YEAR, P_BRAND1; ](6)Stack Traces | 0.137s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select sum(CAST(LO_REVENUE AS DOUBLE)), D_YEAR, P_BRAND1 from lineorder, dates, part, supplier where LO_ORDERDATE = D_DATEKEY and LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and P_BRAND1 between 'MFGR#2221' and 'MFGR#2228' and S_REGION = 'ASIA' group by D_YEAR, P_BRAND1 order by D_YEAR, P_BRAND1; ](5)Stack Traces | 0.152s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.SSBQueryIntegrationTest::testSSBQueries[select sum(CAST(LO_REVENUE AS DOUBLE)), D_YEAR, P_BRAND1 from lineorder, dates, part, supplier where LO_ORDERDATE = D_DATEKEY and LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and P_CATEGORY = 'MFGR#12' and S_REGION = 'AMERICA' group by D_YEAR, P_BRAND1 order by D_YEAR, P_BRAND1; ](4)Stack Traces | 0.242s run time
Query had processing exceptions: [{"errorCode":235,"message":"Found 1 unavailable segments for table supplier: [supplier_0 %]"}]
org.apache.pinot.integration.tests.UpsertTableIntegrationTest::testUpsertCompactionWithSoftDeleteStack Traces | 607s run time
Failed to meet condition in 600000ms, error message: Failed to load all documents
View the full list of 1 :snowflake: flaky test(s)
org.apache.pinot.common.utils.tls.RenewableTlsUtilsTest::reloadSslFactoryWhenFileStoreChangesFlake rate in main: 95.00% (Passed 3 times, Failed 57 times)
Stack Traces | 0.656s run time
did not expect [sun.security.rsa.RSAPrivateCrtKeyImpl@771701d] but found [sun.security.rsa.RSAPrivateCrtKeyImpl@771701d]
To view more test analytics, go to the Test Analytics Dashboard 📋 Got 3 mins? Take this short survey to help us improve Test Analytics.
@jackie the sleep is needed, without Thread.sleep, the tasks complete immediately, so currentGauge drops to 0 before we can check it. I think for similar reasons is why the existing tests use it too
test failures look unrelated, can you re-run @Jackie-Jiang ?