trino icon indicating copy to clipboard operation
trino copied to clipboard

Implement Query Info Pruned Endpoint

Open belen-pruvost opened this issue 11 months ago • 29 comments

Description

This PR extends the Info endpoint on QueryResource - with the boolean parameter pruned which returns a lightweight version of QueryInfo, by pruning the StageInfo object graph.

Additional context and related issues

The QueryInfo object returned by the QueryInfo endpoint can be very big and take a lot of memory, as it contains the entire plan, all the tasks, and all the buffers involved in the tasks, even though this data may not always be relevant.

On the QueryStateMachine class there is already a method available to prune the QueryInfo object, which this new endpoint leverages.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( x ) Release notes are required, with the following suggested text:

# General
* Add boolean query parameter to the QueryInfo endpoint that returns a lightweight response, by pruning the StageInfo object, with route `/v1/query/{queryId}?pruned=true` 

belen-pruvost avatar Feb 26 '24 14:02 belen-pruvost

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 26 '24 14:02 cla-bot[bot]

Maybe instead of adding a new endpoint we should add a query parameter to the existing endpoint?

/v1/query/{queryId}?pruned=true

wendigo avatar Feb 26 '24 20:02 wendigo

Maybe instead of adding a new endpoint we should add a query parameter to the existing endpoint?

/v1/query/{queryId}?pruned=true

Definitely sounds like a good idea 👍 I'll update the PR to have this change later today, thanks for the suggestion!

belen-pruvost avatar Feb 27 '24 09:02 belen-pruvost

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 27 '24 15:02 cla-bot[bot]

@wendigo I've incorporated your feedback and sent the CLA via email, so this PR should be good for another review, when you have the time 🙏

belen-pruvost avatar Feb 27 '24 15:02 belen-pruvost

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 28 '24 11:02 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 28 '24 11:02 cla-bot[bot]

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Mar 25 '24 17:03 github-actions[bot]

One question: how do you plan to consume this endpoint? There is no usage of it so far in the Trino's codebase.

wendigo avatar Mar 25 '24 17:03 wendigo

FYI @radek-starburst

hashhar avatar Mar 26 '24 09:03 hashhar

@belen-pruvost , we recently merged https://github.com/trinodb/trino/pull/21075 . Please check whether it resolve your problem.

radek-kondziolka avatar Mar 26 '24 10:03 radek-kondziolka

One question: how do you plan to consume this endpoint? There is no usage of it so far in the Trino's codebase.

We're calling it from a separate client which is in charge of queueing executions onto Trino, and which then gets execution metadata for running internal analytics (on things like popular datasets)

belen-pruvost avatar Mar 26 '24 10:03 belen-pruvost

@belen-pruvost , we recently merged #21075 . Please check whether it resolve your problem.

Thanks for the pointer - I'll give it a try and report back here if it solves for our use case.

belen-pruvost avatar Mar 26 '24 10:03 belen-pruvost

I've pushed the changes with the comments here addressed. I still need to test if using getResultQueryInfo instead of pruneQueryInfo would achieve something similar.

belen-pruvost avatar Mar 26 '24 11:03 belen-pruvost

Unfortunately relying on BasicQueryInfo (implemented on https://github.com/trinodb/trino/pull/21075/) won't work for us because the data we need are referencedTables and routines

belen-pruvost avatar Mar 26 '24 11:03 belen-pruvost

Maybe BasicQueryInfo could be extended with routines and referencedTables? cc: @electrum

radek-kondziolka avatar Mar 26 '24 12:03 radek-kondziolka

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Apr 17 '24 18:04 github-actions[bot]

I think we should have usage of this endpoint in Trino web ui or CLI or jdbc client to justify adding it to Trino server. Without seeing how it will be used, someone could easily refactor it in a way that breaks intended use case or decide to remove it altogether.

Currently the expected behaviour of the endpoint is safeguarded by a unit test, but I do understand your point. What would be a good usage of the endpoint in the Web UI or the CLI? I can try to give it a go.

belen-pruvost avatar Apr 19 '24 06:04 belen-pruvost

I think we should have usage of this endpoint in Trino web ui or CLI or jdbc client to justify adding it to Trino server. Without seeing how it will be used, someone could easily refactor it in a way that breaks intended use case or decide to remove it altogether.

Currently the expected behaviour of the endpoint is safeguarded by a unit test, but I do understand your point. What would be a good usage of the endpoint in the Web UI or the CLI? I can try to give it a go.

I don't have any particular usage in mind, may be think about how you intended to use it and see if that can fit somewhere in Trino web UI.

raunaqmorarka avatar Apr 19 '24 10:04 raunaqmorarka

I don't have any particular usage in mind, may be think about how you intended to use it and see if that can fit somewhere in Trino web UI.

I've added a Pruned JSON tab to the query UI that will call this endpoint.

belen-pruvost avatar Apr 26 '24 07:04 belen-pruvost

I've added a Pruned JSON tab to the query UI that will call this endpoint.

Why would anyone want to see "pruned JSON" in the UI if they can see normal JSON there?

cc @raghavsethi for UI change

findepi avatar Apr 26 '24 09:04 findepi

Why would anyone want to see "pruned JSON" in the UI if they can see normal JSON there?

It does not add any value, I'll remove it and use the endpoint on existing functionality instead.

belen-pruvost avatar Apr 26 '24 16:04 belen-pruvost

Maybe its worth having the additional view if it is just less overwhelming for a beginner user and still has valid info ..

mosabua avatar Apr 26 '24 16:04 mosabua

I ended up following @mosabua's suggestion and implemented an additional view called References, which shows Referenced Tables and Routines by making used of the pruned parameter.

A few screenshots of the feature working:

  • When there are several tables and routines being referenced: Screenshot 2024-04-26 at 19 49 16

  • When there are several tables and no routines being referenced: Screenshot 2024-04-26 at 19 49 21

  • When there are no tables and several routines being referenced: Screenshot 2024-04-26 at 19 49 27

  • When there are no tables or routines being referenced: Screenshot 2024-04-26 at 19 49 34

belen-pruvost avatar Apr 26 '24 18:04 belen-pruvost

Please verify what is happening in terms of the CI test failures @belen-pruvost

mosabua avatar Apr 26 '24 20:04 mosabua

Please verify what is happening in terms of the CI test failures @belen-pruvost

The failing test is TestMergeOperator#testMultipleStreamsSameOutputColumns. I've tried running it locally but it succeeds, it may be a transient issue, but I can't retrigger the CI check. @mosabua is this something you can do? 🙏

belen-pruvost avatar Apr 26 '24 20:04 belen-pruvost

Its definitely unrelated .. so its fine.

mosabua avatar Apr 26 '24 21:04 mosabua

Any concerns with merging this @findepi @raunaqmorarka or @electrum ?

mosabua avatar Apr 26 '24 21:04 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar May 20 '24 17:05 github-actions[bot]

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jun 19 '24 17:06 github-actions[bot]