feat(sqllab): Add timeout on fetching query results
SUMMARY
This commit adds the timeout on fetching query results. Additionally, a 'Retry without timeout' option is provided so that when a timeout issue occurs, users can attempt to proceed by ignoring the error.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
https://github.com/user-attachments/assets/af5dfc12-4072-484a-94d8-57b3258b3163
TESTING INSTRUCTIONS
SET SQLLAB_QUERY_RESULT_TIMEOUT with a small timeout Run a query async Check the error message and then click "Retry" to get the result without timeout errors
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Codecov Report
:x: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 72.52%. Comparing base (76d897e) to head (65cd417).
:warning: Report is 2419 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...frontend/src/SqlLab/components/ResultSet/index.tsx | 75.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #29959 +/- ##
===========================================
+ Coverage 60.48% 72.52% +12.03%
===========================================
Files 1931 1976 +45
Lines 76236 79638 +3402
Branches 8568 9072 +504
===========================================
+ Hits 46114 57759 +11645
+ Misses 28017 21155 -6862
+ Partials 2105 724 -1381
| Flag | Coverage Δ | |
|---|---|---|
| hive | 48.87% <100.00%> (-0.29%) |
:arrow_down: |
| javascript | 62.12% <90.00%> (∅) |
|
| mysql | 76.66% <100.00%> (?) |
|
| postgres | 76.73% <100.00%> (?) |
|
| presto | 53.38% <100.00%> (-0.42%) |
:arrow_down: |
| python | 83.69% <100.00%> (+20.18%) |
:arrow_up: |
| sqlite | 76.18% <100.00%> (?) |
|
| unit | 60.44% <100.00%> (+2.81%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@justinpark we currently have
# Timeout duration for SQL Lab synchronous queries
SQLLAB_TIMEOUT = int(timedelta(seconds=30).total_seconds())
# The MAX duration a query can run for before being killed by celery.
SQLLAB_ASYNC_TIME_LIMIT_SEC = int(timedelta(hours=6).total_seconds())
Can't we reuse one of these for this? I'm thinking that we could expand the scope of SQLLAB_TIMEOUT (changing the comment) to be valid for both sync and async queries.
Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't https://github.com/apache/superset/pull/29956 be sufficient to handle these cases?
Can't we reuse one of these for this? I'm thinking that we could expand the scope of
SQLLAB_TIMEOUT(changing the comment) to be valid for both sync and async queries.
@michael-s-molina QUERY_RESULT_TIMEOUT is different from the existing SQLLAB_TIMEOUT and SQLLAB_ASYNC_TIME_LIMIT_SEC. This refers to a request that occurs in an ASYNC QUERY, which is different from the actual query execution time. It is a request to retrieve the completed query result from the cache. In formula form, it can be expressed as follows:
SQLLAB_TIMEOUT = SQLLAB_ASYNC_TIME_LIMIT_SEC + SQLLAB_QUERY_RESULT_TIMEOUT
The reason for delays related to the timeout is due to locking caused by updates to the query table, which leads to delays in querying the information linked to the resultsKey of the query ID. To address this issue, a separate timeout is configured.
Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't https://github.com/apache/superset/pull/29956 be sufficient to handle these cases?
The locking delay in synchronizing query status was resolved with PR 29956. This patch also addresses the intermittent issue of the fetching state freezing (when the timer stops and results are retrieved) that occasionally occurred after the query status was confirmed as successful.
Making a request without a timeout applies the max request time related on the HTTP server, which is the same as the existing result retrieval process. The new addition is simply an update to the UX, where pending states are cut off slightly earlier than the HTTP server’s max request timeout. If a request requires more time than the configured timeout, users can re-initiate the request using the original method, depending on their preference.
/testenv up
@mistercrunch Ephemeral environment spinning up at http://34.215.131.40:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.
Updated the button label to "Retry fetching results" which will run as-is (which means no overrides of timeout)
Ephemeral environment shutdown and build artifacts deleted.
@sadpandajoe Could you add this fix on 4.1?