superset feat(sqllab): Add timeout on fetching query results

SUMMARY

This commit adds the timeout on fetching query results. Additionally, a 'Retry without timeout' option is provided so that when a timeout issue occurs, users can attempt to proceed by ignoring the error.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

https://github.com/user-attachments/assets/af5dfc12-4072-484a-94d8-57b3258b3163

TESTING INSTRUCTIONS

SET SQLLAB_QUERY_RESULT_TIMEOUT with a small timeout Run a query async Check the error message and then click "Retry" to get the result without timeout errors

ADDITIONAL INFORMATION

[ ] Has associated issue:
[ ] Required feature flags:
[ ] Changes UI
[ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
[ ] Introduces new feature or API
[ ] Removes existing feature or API

Aug 16 '24 10:08 justinpark

Codecov Report

:x: Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 72.52%. Comparing base (76d897e) to head (65cd417). :warning: Report is 2419 commits behind head on master.

Files with missing lines	Patch %	Lines
...frontend/src/SqlLab/components/ResultSet/index.tsx	75.00%	1 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #29959       +/-   ##
===========================================
+ Coverage   60.48%   72.52%   +12.03%     
===========================================
  Files        1931     1976       +45     
  Lines       76236    79638     +3402     
  Branches     8568     9072      +504     
===========================================
+ Hits        46114    57759    +11645     
+ Misses      28017    21155     -6862     
+ Partials     2105      724     -1381

Flag	Coverage Δ
hive	`48.87% <100.00%> (-0.29%)`	:arrow_down:
javascript	`62.12% <90.00%> (∅)`
mysql	`76.66% <100.00%> (?)`
postgres	`76.73% <100.00%> (?)`
presto	`53.38% <100.00%> (-0.42%)`	:arrow_down:
python	`83.69% <100.00%> (+20.18%)`	:arrow_up:
sqlite	`76.18% <100.00%> (?)`
unit	`60.44% <100.00%> (+2.81%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aug 16 '24 10:08 codecov[bot]

@justinpark we currently have

# Timeout duration for SQL Lab synchronous queries
SQLLAB_TIMEOUT = int(timedelta(seconds=30).total_seconds())

# The MAX duration a query can run for before being killed by celery.
SQLLAB_ASYNC_TIME_LIMIT_SEC = int(timedelta(hours=6).total_seconds())

Can't we reuse one of these for this? I'm thinking that we could expand the scope of SQLLAB_TIMEOUT (changing the comment) to be valid for both sync and async queries.

Aug 16 '24 13:08 michael-s-molina

Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't https://github.com/apache/superset/pull/29956 be sufficient to handle these cases?

Aug 16 '24 13:08 michael-s-molina

Can't we reuse one of these for this? I'm thinking that we could expand the scope of SQLLAB_TIMEOUT (changing the comment) to be valid for both sync and async queries.

@michael-s-molina QUERY_RESULT_TIMEOUT is different from the existing SQLLAB_TIMEOUT and SQLLAB_ASYNC_TIME_LIMIT_SEC. This refers to a request that occurs in an ASYNC QUERY, which is different from the actual query execution time. It is a request to retrieve the completed query result from the cache. In formula form, it can be expressed as follows:

SQLLAB_TIMEOUT = SQLLAB_ASYNC_TIME_LIMIT_SEC + SQLLAB_QUERY_RESULT_TIMEOUT

The reason for delays related to the timeout is due to locking caused by updates to the query table, which leads to delays in querying the information linked to the resultsKey of the query ID. To address this issue, a separate timeout is configured.

Another question is about retry without time limit. Isn't this dangerous and probably inaccurate? I'm assuming all database engines will impose timeouts and this setting will not bypass this? I'm also assuming that a timeout was set for a reason, so giving users the ability to override this seems dangerous. If you're thinking about zombie frontend requests, wouldn't https://github.com/apache/superset/pull/29956 be sufficient to handle these cases?

The locking delay in synchronizing query status was resolved with PR 29956. This patch also addresses the intermittent issue of the fetching state freezing (when the timer stops and results are retrieved) that occasionally occurred after the query status was confirmed as successful.

Making a request without a timeout applies the max request time related on the HTTP server, which is the same as the existing result retrieval process. The new addition is simply an update to the UX, where pending states are cut off slightly earlier than the HTTP server’s max request timeout. If a request requires more time than the configured timeout, users can re-initiate the request using the original method, depending on their preference.

Aug 19 '24 08:08 justinpark

/testenv up

Aug 21 '24 15:08 mistercrunch

@mistercrunch Ephemeral environment spinning up at http://34.215.131.40:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

Aug 21 '24 15:08 github-actions[bot]

Screenshot 2024-09-11 at 12 13 32 PM Updated the button label to "Retry fetching results" which will run as-is (which means no overrides of timeout)

Sep 11 '24 19:09 justinpark

Ephemeral environment shutdown and build artifacts deleted.

Sep 12 '24 16:09 github-actions[bot]

@sadpandajoe Could you add this fix on 4.1?

Sep 12 '24 16:09 justinpark