Add support for Cursors through API Query Params
Cursor support will allow Pinot clients to consume query results in smaller chunks. This feature allows clients to work with lesser resources esp. memory. Application logic is simpler with cursors. For example an app UI paginates through results in a table or a graph. Cursor support has been implemented using APIs.
Implementation for #13185
API
POST /query/sql
A new broker API parameter has been added to trigger pagination. The API accepts the following new optional query parameters:
- getCursor(boolean):
- numRows (int): The number of rows to return in the first page.
The response contains the following extra fields:
| Field | Description |
|---|---|
| brokerHost | hostname of the processing broker |
| brokerPort | port of the processing broker |
| offset | starting offset of the result table slice in the cursor response |
| numRows | Number of rows in result table slice in the cursor response |
| cursorResultWriteTimeMs | Time taken to write the query response to ResponseStore |
| cursorFetchTimeMs | Time taken to read a slice of the result table from ResponseStore |
| submissionTime | Unix timestamp when the query was submitted |
| expirationTime | Unix timestamp when the response can be deleted from the ResponseStore |
| bytesWritten | number of bytes written to the response store when storing the result table |
GET /resultStore/{requestId}/results
This is a broker API that can be used to iterate over the result set of a query submitted using the above API. The API accepts the following query parameters:
- offset (int) (required): The start offset of the page of results.
- numRows (int) (optional): The number of rows in the page. By default it will use the default size.
GET /resultStore/{requestId}/
Returns the BrokerResponse metadata of the query.
GET /resultStore
Lists all the requestIds of all the query results available in the response store.
DELETE /resultStore/{requestId}/
Delete the results of a query. The API accepts the following query parameters:
- requestId (required)
SPI
The PR implements a FileSystem ResponseStore and a JSON ResponseSerde.
The feature provides two SPIs to extend the feature to support other implementations:
- ResponseSerde: Serialize/Deserialize the response.
- ResponseStore: Store responses in a storage system. Both SPIs use Java SPI and the default ServiceLoader to find implementation of the SPIs. All implementation should be annotated with AutoService to help generate files for discovering the implementations.
Configuration
ResponseStore
| Configuration | Default | Description |
|---|---|---|
| pinot.broker.cursor.response.store.type | file | The protocol to use for storage |
| pinot.broker.cursor.response.store.serde | json | The Serialization/Deserialization protocol to use for the result set |
File Response Store
| Configuration | Default | Description |
|---|---|---|
| pinot.broker.cursor.response.store.file.data.dir | /tmp/pinot/broker/response_store/data | Location where result files will be stored. |
| pinot.broker.cursor.response.store.file.data.dir | file:///tmp/pinot/broker/response_store/data | Location where temporary files will be created. |
| pinot.broker.cursor.response.store.file.extension | json | The file name extension |
Miscellaneous
| Configuration | Default | Description |
|---|---|---|
| pinot.broker.cursor.result.size | 10000 | The result size if numRows is not specified in the API call. |
| pinot.broker.cursor.response.store.expiration | 1h | The time before a query result will be deleted. |
| controller.cluster.response.store.cleaner.frequencyPeriod | 1h | The frequency of the periodic task that deletes expired query results |
| controller.cluster.response.store.cursor.cleaner.initialDelay | random | The initial delay before the first run of the periodic task. |
tags: feature, multi-stage, release-notes
Codecov Report
Attention: Patch coverage is 14.06250% with 385 lines in your changes missing coverage. Please review.
Project coverage is 63.90%. Comparing base (
59551e4) to head (ce3c4d7). Report is 1502 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #14110 +/- ##
============================================
+ Coverage 61.75% 63.90% +2.14%
- Complexity 207 1608 +1401
============================================
Files 2436 2713 +277
Lines 133233 149658 +16425
Branches 20636 22908 +2272
============================================
+ Hits 82274 95634 +13360
- Misses 44911 47012 +2101
- Partials 6048 7012 +964
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.87% <14.06%> (+2.16%) |
:arrow_up: |
| java-21 | 63.75% <14.06%> (+2.12%) |
:arrow_up: |
| skip-bytebuffers-false | 63.89% <14.06%> (+2.15%) |
:arrow_up: |
| skip-bytebuffers-true | 63.71% <14.06%> (+35.98%) |
:arrow_up: |
| temurin | 63.90% <14.06%> (+2.14%) |
:arrow_up: |
| unittests | 63.89% <14.06%> (+2.15%) |
:arrow_up: |
| unittests1 | 56.26% <3.03%> (+9.37%) |
:arrow_up: |
| unittests2 | 34.41% <14.06%> (+6.68%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Partial review
@gortiz @jackjlli Listing the open q.'s
Design:
- ManualAuthorization to read response from response store. I have implement authorization by using tables queried in the request.
- SPI does not support reading partial responses. Currently the SPI has been tried with a simple file system as backend. The SPI can be improved incrementally & made more complex when more sophisticated backends are used
Rest are coding standards/craftmanship comments. I have either addressed or responded to the conversations.
PTAL when you get a chance.
All the failures for commit https://github.com/apache/pinot/pull/14110/commits/8b72aa8946aa77d903466458c1ad3aa43d2b94a9 are in ingestion tests and are unrelated to this feature.
I have added javadoc in ResponseStore to emphasis that there is a response store per broker and it is assumed that it is a shared-nothing storage. This is far more simpler and from experience it is not a disadvantage. In OSS there is no concept of a proxy and clients connect to a specific broker. It is trivial logic to remember the broker and submit subsequent requests OR get the broker host/port from the cursor response.
Pending discussions:
- shared-nothing vs shared-storage
- cursor response constructor
- Naming - Response Store
- Auth config param