Database query quota
Goal
- Ability to impose query rate limits at database level. This will consider all the queries that are made to the tables present under the database.
- Way to specify query quota at database level
Introduce database rate limiter
Pinot decides if the query quota is reached or not with the help of QueryQuotaManager.
QueryQuotaManager interface right now has an acquire(tableName) method which decides if the quota for the provided table is reached or not by returning a boolean.
To allow similar mechanism for database query quota a new interface method acquireDatabase(databaseName) is introduced on QueryQuotaManager interface
Query quota provisioning
Right now as the query and storage quotas on tables are maintained at TableConfig itself, we follow the same approach for database quota configs as well.
We don't have a place to provide database specific configs as it not being a first class entity. But for this usecase and similar future usecases we need to maintain a separate DatabaseConfig znode. Assumptions on the new DatabaseConfig znode:
- This config does not represent the logical database entity.
- Absence of this config does not prevent users from creating tables under a database.
- Deletion of thus config does not delete the tables under the database.
Provide default database query quota in ClusterConfig as below
"databaseMaxQueriesPerSecond" : "1000"
If we wish to provide an override for a particular database we can do so by creating a znode under PROPERTYSTORE/CONFIGS/DATABASE/<databaseName>
{
"databaseName" : "db1",
"quota" : {
"maxQueriesPerSecond" : "200"
}
}
Exposed APIs on controller to manage this:
-
POST /databases/{databaseName}/quotas?maxQueriesPerSecond=<query quota> -
GET /databases/{databaseName}/quotas
Query quota update handling
There are 2 source based on which the database query quota is decided
- Default query quota provided at cluster config
- Override query quota provided at database config
On top of the overall quota provided by above configs, the per broker quota is also affected by the number of live brokers in the cluster. Considering all the above factors, rate limiter updates are handled as below.
- Introduced new user defined message
DatabaseConfigRefreshMessagewhich is sent to brokers upon the database config updates. - Added
ClusterConfigChangeListenertoClusterChangeMediatorto handle the changes made to database query quota in cluster configs. - Update per broker rate quota upon broker resource EV change
- Create database rate limiter upon
OFFLINE->ONLINEtransition of tables inBrokerResourceOnlineOfflineStateModel
We may eventually end up using Database feature. I have a couple of related questions
- Doe the DB level quota get even distributed across all tables in that DB ?
- On a related note, how does the math work for updating / maintaining and enforcing quotas when there is both table level and DB level quota configured ? Is there a precedent ?
- Can we add a new table to an existing DB where that table level quota is exceeding the DB level quota ?
@siddharthteotia
Does the DB level quota get even distributed across all tables in that DB ?
The db level quota is separate from table level quota. Each table has it's separate quota and individually or cumulatively it may exceed the db level quota. But queries will be blocked once the db level quota is reached. Its upto the db admin to ensure proper table level quotas are provided so that tables don’t end up contesting for the db quota share.
On a related note, how does the math work for updating / maintaining and enforcing quotas when there is both table level and DB level quota configured ? Is there a precedent ?
As mentioned above these 2 quotas are separate. For any incoming query both the table quota and db quota is checked and the query will fail if anyone of them is exceeding.
Can we add a new table to an existing DB where that table level quota is exceeding the DB level quota ?
Yes. We don't enforce the rule that sum of all table quotas under a db must not exceed db quota. Cluster/DB maintainer should take the call on what table quota should be and what db quota should be. Not necessary all tables may reach their quota at the same time. DB quota just ensures that at any point the particular db is not running more than x queries and does not affect other DBs in the cluster.
Codecov Report
Attention: Patch coverage is 41.15226% with 143 lines in your changes missing coverage. Please review.
Project coverage is 61.92%. Comparing base (
59551e4) to head (912b4ae). Report is 819 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #13544 +/- ##
============================================
+ Coverage 61.75% 61.92% +0.17%
+ Complexity 207 198 -9
============================================
Files 2436 2557 +121
Lines 133233 141046 +7813
Branches 20636 21935 +1299
============================================
+ Hits 82274 87349 +5075
- Misses 44911 47042 +2131
- Partials 6048 6655 +607
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration2 | 0.00% <0.00%> (ø) |
|
| java-11 | 61.88% <41.15%> (+0.17%) |
:arrow_up: |
| java-21 | 61.82% <41.15%> (+0.19%) |
:arrow_up: |
| skip-bytebuffers-false | 61.91% <41.15%> (+0.16%) |
:arrow_up: |
| skip-bytebuffers-true | 61.81% <41.15%> (+34.08%) |
:arrow_up: |
| temurin | 61.92% <41.15%> (+0.17%) |
:arrow_up: |
| unittests | 61.92% <41.15%> (+0.17%) |
:arrow_up: |
| unittests1 | 46.36% <0.00%> (-0.53%) |
:arrow_down: |
| unittests2 | 27.79% <41.15%> (+0.06%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Are there tests for setting/updating qps due to helix messages for table qps ? Do you plan to tests those out as well ?
@vrajat I am not sure how to specifically test that, will try to add a new integration test for checking query quota. I have done manual testing to ensure all the different flows work. Maybe I can add a section on what all manual tests were performed.
The tests LGTM