trino
trino copied to clipboard
Implement Query Info Pruned Endpoint
Description
This PR extends the Info endpoint on QueryResource - with the boolean parameter pruned
which returns a lightweight version of QueryInfo, by pruning the StageInfo object graph.
Additional context and related issues
The QueryInfo object returned by the QueryInfo endpoint can be very big and take a lot of memory, as it contains the entire plan, all the tasks, and all the buffers involved in the tasks, even though this data may not always be relevant.
On the QueryStateMachine
class there is already a method available to prune the QueryInfo
object, which this new endpoint leverages.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( x ) Release notes are required, with the following suggested text:
# General
* Add boolean query parameter to the QueryInfo endpoint that returns a lightweight response, by pruning the StageInfo object, with route `/v1/query/{queryId}?pruned=true`
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Maybe instead of adding a new endpoint we should add a query parameter to the existing endpoint?
/v1/query/{queryId}?pruned=true
Maybe instead of adding a new endpoint we should add a query parameter to the existing endpoint?
/v1/query/{queryId}?pruned=true
Definitely sounds like a good idea 👍 I'll update the PR to have this change later today, thanks for the suggestion!
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
@wendigo I've incorporated your feedback and sent the CLA via email, so this PR should be good for another review, when you have the time 🙏
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
One question: how do you plan to consume this endpoint? There is no usage of it so far in the Trino's codebase.
FYI @radek-starburst
@belen-pruvost , we recently merged https://github.com/trinodb/trino/pull/21075 . Please check whether it resolve your problem.
One question: how do you plan to consume this endpoint? There is no usage of it so far in the Trino's codebase.
We're calling it from a separate client which is in charge of queueing executions onto Trino, and which then gets execution metadata for running internal analytics (on things like popular datasets)
@belen-pruvost , we recently merged #21075 . Please check whether it resolve your problem.
Thanks for the pointer - I'll give it a try and report back here if it solves for our use case.
I've pushed the changes with the comments here addressed. I still need to test if using getResultQueryInfo
instead of pruneQueryInfo
would achieve something similar.
Unfortunately relying on BasicQueryInfo
(implemented on https://github.com/trinodb/trino/pull/21075/) won't work for us because the data we need are referencedTables
and routines
Maybe BasicQueryInfo
could be extended with routines
and referencedTables
?
cc: @electrum
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
I think we should have usage of this endpoint in Trino web ui or CLI or jdbc client to justify adding it to Trino server. Without seeing how it will be used, someone could easily refactor it in a way that breaks intended use case or decide to remove it altogether.
Currently the expected behaviour of the endpoint is safeguarded by a unit test, but I do understand your point. What would be a good usage of the endpoint in the Web UI or the CLI? I can try to give it a go.
I think we should have usage of this endpoint in Trino web ui or CLI or jdbc client to justify adding it to Trino server. Without seeing how it will be used, someone could easily refactor it in a way that breaks intended use case or decide to remove it altogether.
Currently the expected behaviour of the endpoint is safeguarded by a unit test, but I do understand your point. What would be a good usage of the endpoint in the Web UI or the CLI? I can try to give it a go.
I don't have any particular usage in mind, may be think about how you intended to use it and see if that can fit somewhere in Trino web UI.
I don't have any particular usage in mind, may be think about how you intended to use it and see if that can fit somewhere in Trino web UI.
I've added a Pruned JSON
tab to the query UI that will call this endpoint.
I've added a
Pruned JSON
tab to the query UI that will call this endpoint.
Why would anyone want to see "pruned JSON" in the UI if they can see normal JSON there?
cc @raghavsethi for UI change
Why would anyone want to see "pruned JSON" in the UI if they can see normal JSON there?
It does not add any value, I'll remove it and use the endpoint on existing functionality instead.
Maybe its worth having the additional view if it is just less overwhelming for a beginner user and still has valid info ..
I ended up following @mosabua's suggestion and implemented an additional view called References
, which shows Referenced Tables and Routines by making used of the pruned parameter.
A few screenshots of the feature working:
-
When there are several tables and routines being referenced:
-
When there are several tables and no routines being referenced:
-
When there are no tables and several routines being referenced:
-
When there are no tables or routines being referenced:
Please verify what is happening in terms of the CI test failures @belen-pruvost
Please verify what is happening in terms of the CI test failures @belen-pruvost
The failing test is TestMergeOperator#testMultipleStreamsSameOutputColumns
. I've tried running it locally but it succeeds, it may be a transient issue, but I can't retrigger the CI check. @mosabua is this something you can do? 🙏
Its definitely unrelated .. so its fine.
Any concerns with merging this @findepi @raunaqmorarka or @electrum ?
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua