query history: decouple query results data from query record

Open YaroslavLitvinov opened this issue 8 months ago • 1 comments

Issue is created for future plans, when it could be prioritized.

Decouple query results data from query record

We currently run queries in blocking mode. Once query is executed it waits for completion and then saves query and its results in a single QueryRecord. Queries are returned with all their results datas.

Reference implementation behavior

In reference implementation all the queries are stored permanently except of their results that have expiration time.

In Worksheets - once query expired its trace remains in worksheet's query history, but their results becomes unavailable and handler returns 404 when trying to get query results.
In All queries - expired queries become unavailable completely without any traces left.
Queries returned from history have no results blob, which should request separately when user clicks on specific query.

Changes required, to have the same behavior as in reference implementation:

Query record contains all data except results blob, and saved at the same key "/qh/{id}"
Queries results data are saved at a separate key = "/qhd/{id}", value="results-blob"
Queries returned from history doesn't include results blob. For small loads it couldn't make performance better, but on loads including big results sets it would be better for UI/IX.
Results blob requested separately - single result set returned by handler in a single call
Results sets can be removed in bulk, if located in requested range: /qhd/{ts1} : /qhd/{ts2}; where ts1 > ts2, as saved in descending order. Removal becomes safe & easy after split as removing result becomes trivial as as simple as removing key from SlateDB.

Solves problems

A split of QueryRecord into QueryRecord + QueryResult is a prerequisite for adding non-blocking asynchronous query execution. As otherwise created query record should be immediately updated after creation for adding results vs at first adding query and then adding its result.
Having a separate key for results data makes removal of outdated query results safe & easy operation, as simple as removing key from SlateDB.

Apr 16 '25 08:04 YaroslavLitvinov

Problem

As reported in the https://github.com/Embucket/embucket/issues/1833 Embucket has issues loading a single page (250 items) of query history. This happens because every QueryRecord has an embedded result-set data. To fix it we may directly save QueryResult to the history, so it will be used separately from QueryRecord, and QueryRecord result will just refer it. This increases performance of slatedb scans, UI page will load less data. Query results requested separately.

Proposal

Eliminate QueryRecord::result, so any QueryRecord has no any direct result data, just result's metadata.
Extend QueryHistory trait, so it provides additional functions: add_query_result, get_query_result.
Extend api-ui REST API handlers so it reflects changes.

@rampage644

Oct 14 '25 10:10 YaroslavLitvinov