redash icon indicating copy to clipboard operation
redash copied to clipboard

Add useQueryAnnotation option to BigQuery Runner

Open htamakos opened this issue 2 years ago • 4 comments

What type of PR is this? (check all applicable)

  • [x] Feature

Description

I have added the useQueryAnnotation option to the BigQuery Query Runner. This option controls whether Query Annotaion is done or not at query execution time. When initializing the redash.query_runner.big_query.BigQuery instance, set should_annotate_query to True if useQueryAnnotation is True, otherwise set it to False If not, set it to False.

~~### Implementation notes~~ ~~BigQuery can be controlled whether to execute legacy or standard SQL by setting the leading comment to #legacySQL or #starndardSQL.~~ ~~Reference: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix~~

~~Therefore, the annotate_query method is used to determine if there is a query prefix, and if so, it is prefixed with the query prefix and given a query annotation after it~~

Why do we need this option?

When using BigQuery with redash, it is likely that multiple users will use the service account for redash via the data source. In that case, there is a need to identify users who are executing very heavy queries from the query execution history. In the absence of query annotations, this identification is very difficult. However, it may not be necessary for existing users, so we have disabled the option by default so that it will not affect existing users.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

htamakos avatar Dec 14 '21 05:12 htamakos

スクリーンショット 2021-12-15 9 20 50

htamakos avatar Dec 15 '21 00:12 htamakos

Thanks for raising this PR. A few items:

  1. Please rebase your changes onto master. I just merged an improvement to schema fetching for bigquery, so master is ahead of this commit.
  2. How does the query annotation boolean interact with the Use Standard SQL option?
  3. From the PR description: "there is a need to identify users who are executing very heavy queries from the query execution history." --> I don't see how the annotations in this PR would be traceable back to a specific user in Redash. What did I miss?
  4. This PR appears to allow a user to specify legacy vs standard sql on a per-query basis by adding a #legacySQL or #standardSQL to the top of the query text. Is this the standard convention with BigQuery?

susodapop avatar Dec 15 '21 04:12 susodapop

@susodapop Thanks for cheking out my PR!

  1. Please rebase your changes onto master. I just merged an improvement to schema fetching for bigquery, so master is ahead of this commit.

done

2.How does the query annotation boolean interact with the Use Standard SQL option?

The Use Query Annotation annotation option is not related to the Use Standard SQL option.

  1. From the PR description: "there is a need to identify users who are executing very heavy queries from the query execution history." --> I don't see how the annotations in this PR would be traceable back to a specific user in Redash. What did I miss?

Although it is not shown in the test, you can tell from the SQL comment who executed which query because it will have a comment like the following.

/* Username: [email protected], query_id: adhoc, Queue: queries, Job ID: 813c588e-649f-462b-b316-7db5db9f6959, Query Hash: f6bf37efedbc0a2dfffc1caf5088d86e, Scheduled: False */
select 1

https://github.com/getredash/redash/blob/d8d7c78992e44a4b6d7fdd4c39ccc1c35cd8c7a9/redash/handlers/query_results.py#L121-L125

This PR appears to allow a user to specify legacy vs standard sql on a per-query basis by adding a #legacySQL or #standardSQL to the top of the query text. Is this the standard convention with BigQuery?

I'm sorry. This was my misunderstanding of the specifications. #standardSQL and #legacySQL are only effective for UI execution, and are ignored when executed via API. I have removed this setting in an additional commit.

htamakos avatar Dec 15 '21 06:12 htamakos

@susodapop Hi, our team wants this feature. What should I do to merge this PR?

htamakos avatar Oct 25 '22 04:10 htamakos

@susodapop Hello @susodapop

I hope this message finds you well. I am reaching out to inquire about the status of my pull request #5675. It has been a considerable amount of time since I submitted the pull request, and unfortunately, it has not yet been reviewed or merged.

I understand that you may have other priorities, but I would greatly appreciate it if you could take the time to review and possibly merge this pull request as soon as possible. This would allow me to continue contributing to the project and ensure that our team can move forward with this feature.

Please let me know if there are any concerns or issues with my pull request that are preventing it from being merged. I am open to feedback and willing to make any necessary changes.

Thank you for your time and consideration.

htamakos avatar Apr 10 '23 02:04 htamakos

@htamakos , thanks for the PR! I'm sorry about the lack of response of the original maintainers. We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

guidopetri avatar Aug 20 '23 22:08 guidopetri

The concept of this sounds useful for BigQuery users, so I've just updated this PR to our master branch and fixed the formatting problems reported by our backend lint tests.

Lets see if it passes our CI tests as-is. :smile:

justinclift avatar Aug 22 '23 22:08 justinclift

Codecov Report

Merging #5675 (69be3e3) into master (4107265) will increase coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5675      +/-   ##
==========================================
+ Coverage   60.70%   60.72%   +0.02%     
==========================================
  Files         154      154              
  Lines       12624    12626       +2     
  Branches     1716     1716              
==========================================
+ Hits         7663     7667       +4     
+ Misses       4735     4734       -1     
+ Partials      226      225       -1     
Files Changed Coverage Δ
redash/query_runner/big_query.py 23.52% <100.00%> (+0.82%) :arrow_up:

... and 1 file with indirect coverage changes

codecov[bot] avatar Aug 22 '23 22:08 codecov[bot]

@guidopetri Any interest in reviewing this? I'm pretty sure it'll be fairly straight forward. :smile:

justinclift avatar Aug 22 '23 23:08 justinclift

Sure, I can try taking a look this weekend.

guidopetri avatar Aug 22 '23 23:08 guidopetri