redash
redash copied to clipboard
Add useQueryAnnotation option to BigQuery Runner
What type of PR is this? (check all applicable)
- [x] Feature
Description
I have added the useQueryAnnotation option to the BigQuery Query Runner.
This option controls whether Query Annotaion is done or not at query execution time. When initializing the redash.query_runner.big_query.BigQuery instance, set should_annotate_query to True if useQueryAnnotation is True, otherwise set it to False If not, set it to False.
~~### Implementation notes~~
~~BigQuery can be controlled whether to execute legacy or standard SQL by setting the leading comment to #legacySQL or #starndardSQL.~~
~~Reference: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix~~
~~Therefore, the annotate_query method is used to determine if there is a query prefix, and if so, it is prefixed with the query prefix and given a query annotation after it~~
Why do we need this option?
When using BigQuery with redash, it is likely that multiple users will use the service account for redash via the data source. In that case, there is a need to identify users who are executing very heavy queries from the query execution history. In the absence of query annotations, this identification is very difficult. However, it may not be necessary for existing users, so we have disabled the option by default so that it will not affect existing users.
Related Tickets & Documents
Mobile & Desktop Screenshots/Recordings (if there are UI changes)
Thanks for raising this PR. A few items:
- Please rebase your changes onto master. I just merged an improvement to schema fetching for bigquery, so
masteris ahead of this commit. - How does the query annotation boolean interact with the
Use Standard SQLoption? - From the PR description: "there is a need to identify users who are executing very heavy queries from the query execution history." --> I don't see how the annotations in this PR would be traceable back to a specific user in Redash. What did I miss?
- This PR appears to allow a user to specify legacy vs standard sql on a per-query basis by adding a
#legacySQLor#standardSQLto the top of the query text. Is this the standard convention with BigQuery?
@susodapop Thanks for cheking out my PR!
- Please rebase your changes onto master. I just merged an improvement to schema fetching for bigquery, so master is ahead of this commit.
done
2.How does the query annotation boolean interact with the Use Standard SQL option?
The Use Query Annotation annotation option is not related to the Use Standard SQL option.
- From the PR description: "there is a need to identify users who are executing very heavy queries from the query execution history." --> I don't see how the annotations in this PR would be traceable back to a specific user in Redash. What did I miss?
Although it is not shown in the test, you can tell from the SQL comment who executed which query because it will have a comment like the following.
/* Username: [email protected], query_id: adhoc, Queue: queries, Job ID: 813c588e-649f-462b-b316-7db5db9f6959, Query Hash: f6bf37efedbc0a2dfffc1caf5088d86e, Scheduled: False */
select 1
https://github.com/getredash/redash/blob/d8d7c78992e44a4b6d7fdd4c39ccc1c35cd8c7a9/redash/handlers/query_results.py#L121-L125
This PR appears to allow a user to specify legacy vs standard sql on a per-query basis by adding a #legacySQL or #standardSQL to the top of the query text. Is this the standard convention with BigQuery?
I'm sorry. This was my misunderstanding of the specifications. #standardSQL and #legacySQL are only effective for UI execution, and are ignored when executed via API. I have removed this setting in an additional commit.
@susodapop Hi, our team wants this feature. What should I do to merge this PR?
@susodapop Hello @susodapop
I hope this message finds you well. I am reaching out to inquire about the status of my pull request #5675. It has been a considerable amount of time since I submitted the pull request, and unfortunately, it has not yet been reviewed or merged.
I understand that you may have other priorities, but I would greatly appreciate it if you could take the time to review and possibly merge this pull request as soon as possible. This would allow me to continue contributing to the project and ensure that our team can move forward with this feature.
Please let me know if there are any concerns or issues with my pull request that are preventing it from being merged. I am open to feedback and willing to make any necessary changes.
Thank you for your time and consideration.
@htamakos , thanks for the PR! I'm sorry about the lack of response of the original maintainers. We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI?
We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.
The concept of this sounds useful for BigQuery users, so I've just updated this PR to our master branch and fixed the formatting problems reported by our backend lint tests.
Lets see if it passes our CI tests as-is. :smile:
Codecov Report
Merging #5675 (69be3e3) into master (4107265) will increase coverage by
0.02%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #5675 +/- ##
==========================================
+ Coverage 60.70% 60.72% +0.02%
==========================================
Files 154 154
Lines 12624 12626 +2
Branches 1716 1716
==========================================
+ Hits 7663 7667 +4
+ Misses 4735 4734 -1
+ Partials 226 225 -1
| Files Changed | Coverage Δ | |
|---|---|---|
| redash/query_runner/big_query.py | 23.52% <100.00%> (+0.82%) |
:arrow_up: |
@guidopetri Any interest in reviewing this? I'm pretty sure it'll be fairly straight forward. :smile:
Sure, I can try taking a look this weekend.