superset icon indicating copy to clipboard operation
superset copied to clipboard

fix: Chart cache-warmup task fails on Superset 4.0

Open rmasters opened this issue 1 year ago • 5 comments

SUMMARY

Fixes #28705

We found our chart cache warm-up task (as in the example in the Kubernetes docs) was failing with this error:

[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading strategy
[2024-05-24 22:30:00,045: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Loading TopNDashboardsStrategy
[2024-05-24 22:30:00,046: INFO/ForkPoolWorker-1] cache-warmup[ae85f28c-acfa-4bb1-a885-e7f0121610b9]: Success!
HTTPException
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1788, in dispatch_request
    self.raise_routing_exception(req)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1770, in raise_routing_exception
    raise request.routing_exception # type: ignore
File "/usr/local/lib/python3.10/site-packages/flask/ctx.py", line 351, in match_request
    result = self.url_adapter.match(return_rule=True) # type: ignore
File "/usr/local/lib/python3.10/site-packages/werkzeug/routing/map.py", line 619, in match
    raise MethodNotAllowed(valid_methods=list(e.have_match_for)) from None
werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL.
[repeated]

On investigation I found that:

  • The cache-warmup task calls PUT /superset/warm_up_caches/ (via the fetch_url task)
  • /superset/warm_up_caches/ (Superset.warm_up_caches) exists but is a GET request
  • That endpoint is deprecated by /api/v1/chart/warm_up_caches (ChartRestApi.warm_up_cache) (scheduled for deletion in 4.0 but I couldn't find discussions around that)

This PR changes the task to call the newer endpoint (accidentally reverted in 56069b05f9cf4d0c725d1b4b0ad6038b50837cd4).

I also ran into CSRF issues (similar to discussion in #27160), so added a method to fetch this (thanks to Will Gan on slack for the hint). Perhaps this can be placed more centrally? This appears to work on installations with CSRF disabled (I've tested with WTF_CSRF_ENABLED = False).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

  1. Ensure you have a scheduled job to warmup caches, e.g. by adding to CeleryConfig:
     class CeleryConfig:
         broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    -    imports = ("superset.sql_lab",)
    +    imports = (
    +        "superset.sql_lab",
    +        "superset.tasks.cache",
    +    )
         result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
         worker_prefetch_multiplier = 1
         task_acks_late = False
    @@ -87,6 +90,13 @@ class CeleryConfig:
                 "task": "reports.prune_log",
                 "schedule": crontab(minute=10, hour=0),
             },
    +        "cache-warmup-dummy": {
    +            "task": "cache-warmup",
    +            "schedule": crontab(minute="*/1", hour="*"),
    +            "kwargs": {
    +                "strategy_name": "dummy",
    +            },
    +        },
         }
    
  2. If starting from an empty installation, create a dashboard, with at least one data-backed chart, and publish it
  3. Each minute you should see log lines like:
    superset_worker        | [2024-05-25 00:18:03,203: INFO/ForkPoolWorker-1] fetch_url[b9a9c5c6-41e0-491f-a8cb-d49c0b9ab396]: Fetching http://superset_app:8088/api/v1/security/csrf_token/
    superset_app           | 2024-05-25 00:18:03,198:DEBUG:superset.stats_logger:[stats_logger] (timing) ChartRestApi.warm_up_cache.time | 176.14193800545763
    superset_app           | 2024-05-25 00:18:03,200:INFO:werkzeug:192.168.165.7 - - [25/May/2024 00:18:03] "PUT /api/v1/chart/warm_up_cache HTTP/1.1" 200 -
    
    as compared to this on master:
    superset_worker        | [2024-05-25 00:26:00,162: INFO/MainProcess] Task fetch_url[a488cfb6-1d63-42bb-be00-a3e6af3e2c4a] received
    superset_worker        | [2024-05-25 00:26:00,165: INFO/ForkPoolWorker-1] fetch_url[9a054a67-acf0-4fd8-b0dd-42ab28da886b]: Fetching http://superset_app:8088/superset/warm_up_cache/ with payload {"chart_id": 156}
    superset_app           | 2024-05-25 00:26:00,168:WARNING:superset.views.base:HTTPException
    superset_app           | Traceback (most recent call last):
    superset_app           |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
    superset_app           |     rv = self.dispatch_request()
    superset_app           |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1458, in dispatch_request
    superset_app           |     self.raise_routing_exception(req)
    superset_app           |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1440, in raise_routing_exception
    superset_worker        | [2024-05-25 00:26:00,167: INFO/ForkPoolWorker-7] cache-warmup[57b6894a-e9a2-4fba-bf8e-2e7d74826f61]: Scheduling {"chart_id": 156}
    superset_app           |     raise request.routing_exception  # type: ignore
    superset_app           |   File "/usr/local/lib/python3.10/site-packages/flask/ctx.py", line 353, in match_request
    superset_app           |     result = self.url_adapter.match(return_rule=True)  # type: ignore
    superset_app           |   File "/usr/local/lib/python3.10/site-packages/werkzeug/routing/map.py", line 624, in match
    superset_app           |     raise MethodNotAllowed(valid_methods=list(e.have_match_for)) from None
    superset_app           | werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL.
    superset_app           | 2024-05-25 00:26:00,170:INFO:werkzeug:192.168.165.8 - - [25/May/2024 00:26:00] "PUT /superset/warm_up_cache/ HTTP/1.1" 405 -
    

A clearer example on my clean install (based on 4.0.0):

superset_worker_beat  | [2024-05-24 23:38:00,065: INFO/MainProcess] Scheduler: Sending due task cache-warmup-hourly (cache-warmup)
superset_worker       | [2024-05-24 23:38:00,084: INFO/MainProcess] Task cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40] received
superset_worker       | [2024-05-24 23:38:00,093: INFO/ForkPoolWorker-7] cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40]: Loading strategy
superset_worker       | [2024-05-24 23:38:00,094: INFO/ForkPoolWorker-7] cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40]: Loading DummyStrategy
superset_worker       | [2024-05-24 23:38:00,095: INFO/ForkPoolWorker-7] cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40]: Success!
superset_worker       | [2024-05-24 23:38:00,161: INFO/ForkPoolWorker-7] cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40]: Scheduling {"chart_id": 1}
superset_worker       | [2024-05-24 23:38:00,181: INFO/MainProcess] Task fetch_url[efd6a561-9c01-455c-aaae-bf8d4e6fb88e] received
superset_worker       | [2024-05-24 23:38:00,185: INFO/ForkPoolWorker-7] Task cache-warmup[c8b9a20b-3fea-45e5-a9fc-9a2fdcfa7b40] succeeded in 0.09269619999395218s: {'scheduled': ['{"chart_id": 1}'], 'errors': []}
superset_worker       | [2024-05-24 23:38:00,202: INFO/ForkPoolWorker-8] fetch_url[efd6a561-9c01-455c-aaae-bf8d4e6fb88e]: Fetching http://webserver.superset.local:8088/api/v1/security/csrf_token/
superset_app          | 2024-05-24 23:38:00,259:INFO:werkzeug:192.168.214.6 - - [24/May/2024 23:38:00] "GET /api/v1/security/csrf_token/ HTTP/1.1" 200 -
superset_worker       | [2024-05-24 23:38:00,261: INFO/ForkPoolWorker-8] fetch_url[efd6a561-9c01-455c-aaae-bf8d4e6fb88e]: Fetching http://webserver.superset.local:8088/api/v1/chart/warm_up_cache with payload {"chart_id": 1}
superset_app          | 2024-05-24 23:38:02,332:INFO:werkzeug:192.168.214.6 - - [24/May/2024 23:38:02] "PUT /api/v1/chart/warm_up_cache HTTP/1.1" 200 -
superset_worker       | [2024-05-24 23:38:02,333: INFO/ForkPoolWorker-8] fetch_url[efd6a561-9c01-455c-aaae-bf8d4e6fb88e]: Fetched http://webserver.superset.local:8088/api/v1/chart/warm_up_cache with payload {"chart_id": 1}, status code: 200
superset_worker       | [2024-05-24 23:38:02,335: INFO/ForkPoolWorker-8] Task fetch_url[efd6a561-9c01-455c-aaae-bf8d4e6fb88e] succeeded in 2.1522701479989337s: {'success': '{"chart_id": 1}', 'response': '{"result":[{"chart_id":1,"viz_error":null,"viz_status":"success"}]}

ADDITIONAL INFORMATION

  • [x] Has associated issue: #28705
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

rmasters avatar May 25 '24 00:05 rmasters

@gooroodev please review

admsev avatar May 25 '24 15:05 admsev

Summary of Changes

  1. New Functionality:

    • Added a new function fetch_csrf_token to fetch a CSRF token for API requests.
    • Integrated the fetch_csrf_token function into the fetch_url task to ensure that CSRF tokens are included in the headers for API requests.
  2. Modifications:

    • Updated the fetch_url method to use ChartRestApi.warm_up_cache instead of Superset.warm_up_cache.
    • Modified the integration test test_fetch_url to include tests for the new fetch_csrf_token function.
  3. Tests:

    • Added a new test test_fetch_csrf_token to verify the functionality of fetching CSRF tokens.
    • Updated the test_fetch_url to mock and validate the behavior of fetch_csrf_token.

Issues, Bugs, or Typos

  1. Error Handling in fetch_csrf_token:

    • The function fetch_csrf_token logs an error but does not raise an exception or handle the error case effectively. This can lead to silent failures.
  2. Variable Naming:

    • The variable session_cookie should be checked for None before being added to the headers to avoid potential issues if the cookie is not set.
  3. Test Improvements:

    • The test test_fetch_csrf_token should include a case where the response code is not 200 to ensure error handling is tested.

Proposed Code Improvements

Improvement 1: Error Handling in fetch_csrf_token

Current Code:

logger.error("Error fetching CSRF token, status code: %s", response.code)
return {}

Improved Code:

logger.error("Error fetching CSRF token, status code: %s", response.code)
raise Exception(f"Failed to fetch CSRF token, status code: {response.code}")

Improvement 2: Check for None in session_cookie

Current Code:

session_cookie = response.headers.get("Set-Cookie")
if response.code == 200:
    data = json.loads(body)
    return {"Cookie": session_cookie, "X-CSRF-Token": data["result"]}

Improved Code:

session_cookie = response.headers.get("Set-Cookie")
if response.code == 200 and session_cookie:
    data = json.loads(body)
    return {"Cookie": session_cookie, "X-CSRF-Token": data["result"]}
else:
    logger.error("Error fetching CSRF token or session cookie missing, status code: %s", response.code)
    raise Exception(f"Failed to fetch CSRF token or session cookie missing, status code: {response.code}")

Improvement 3: Test for Non-200 Response Code

Current Test:

def test_fetch_csrf_token(mock_urlopen, mock_request_cls, base_url, app_context):
    # ... existing test code ...

Improved Test:

def test_fetch_csrf_token(mock_urlopen, mock_request_cls, base_url, app_context):
    from superset.tasks.cache import fetch_csrf_token

    mock_request = mock.MagicMock()
    mock_request_cls.return_value = mock_request

    mock_response = mock.MagicMock()
    mock_urlopen.return_value.__enter__.return_value = mock_response

    # Test for successful response
    mock_response.code = 200
    mock_response.read.return_value = b'{"result": "csrf_token"}'
    mock_response.headers.get.return_value = "new_session_cookie"

    app.config["WEBDRIVER_BASEURL"] = base_url
    headers = {"Cookie": "original_session_cookie"}

    result_headers = fetch_csrf_token(headers)

    mock_request_cls.assert_called_with(
        "http://base-url/api/v1/security/csrf_token/",
        headers=headers,
        method="GET",
    )

    assert result_headers["X-CSRF-Token"] == "csrf_token"
    assert result_headers["Cookie"] == "new_session_cookie"
    mock_urlopen.assert_called_once_with(mock_request, timeout=mock.ANY)

    # Test for unsuccessful response
    mock_response.code = 500
    with pytest.raises(Exception):
        fetch_csrf_token(headers)

General Review of Code Quality and Style

  • Code Quality: The code is generally well-structured and follows good practices. The addition of the fetch_csrf_token function abstracts the CSRF token fetching logic, making the fetch_url function cleaner and more maintainable.
  • Code Style: The code follows PEP 8 guidelines, and the use of type hints improves readability and maintainability.
  • Logging: The use of logging is appropriate for debugging and tracking the flow of the program, but error handling could be more robust.
  • Testing: The tests cover the new functionality well, but additional edge cases and error conditions should be tested to ensure robustness.

Yours, Gooroo.dev. To receive reviews automatically, install Github App

gooroodev avatar May 25 '24 15:05 gooroodev

Codecov Report

Attention: Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.65%. Comparing base (76d897e) to head (620502d). Report is 1701 commits behind head on master.

Files with missing lines Patch % Lines
superset/tasks/utils.py 93.75% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28706       +/-   ##
===========================================
+ Coverage   60.48%   83.65%   +23.16%     
===========================================
  Files        1931      528     -1403     
  Lines       76236    38181    -38055     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31941    -14173     
+ Misses      28017     6240    -21777     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.93% <31.42%> (-0.22%) :arrow_down:
javascript ?
mysql 76.66% <94.28%> (?)
postgres 76.75% <94.28%> (?)
presto 53.48% <31.42%> (-0.32%) :arrow_down:
python 83.65% <94.28%> (+20.15%) :arrow_up:
sqlite 76.24% <94.28%> (?)
unit 60.28% <28.57%> (+2.65%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar May 29 '24 14:05 codecov[bot]

Is there anything I can do to help get this merged?

I'm also interested in working cache warmups

gempir avatar Jun 24 '24 07:06 gempir

  • PEP

Feel free to give it a code review, and/or answer any of the open questions on the thread.

Still hoping to get an opinion from @betodealmeida and/or @dpgaspar , but I'll add @john-bodley to the mix as well.

rusackas avatar Jun 24 '24 17:06 rusackas

I also want to use cache-warmup in Superset 4.0.2. When is it planned to include this fix? Thank you!

cmartinezdnn avatar Jul 04 '24 08:07 cmartinezdnn

Thanks for providing solution, however this wouldn't work if someone is using superset-websockets as in the headers there are two "Set-Cookie"'s first one is an async-token and second session token, your code will get the first one which is wrong and you'll get 404 UNATHORIZED

talfour avatar Jul 07 '24 16:07 talfour

I also want to use cache-warmup in Superset 4.0.2. When is it planned to include this fix? Thank you!

I installed the version 4.0.2, and applied the fix suggested by @rmasters in the cache.py file. And now it's working.

Before the fix, I was getting the 405 error.

fernandoadam avatar Jul 11 '24 14:07 fernandoadam

Would love to close the related issue, but I'm not sure of the risks involved here (particularly with websockets added to the mix). Maybe @villebro or @nytai would have a look here and see if this seems save to merge.

rusackas avatar Jul 30 '24 22:07 rusackas

@rmasters this needs a rebase, and I don't have permissions to push changes to your branch to do it. Can you rebase, and optionally make that utils refactor?

villebro avatar Aug 02 '24 22:08 villebro

@cmartinezdnn Have you tested that condition??

jacobeatsspam avatar Aug 05 '24 20:08 jacobeatsspam

Looks like some CI checks aren't running for whatever reason. Might be in need of the aforementioned rebase, but I'll close and reopen this to see if that kick-starts it.

rusackas avatar Aug 07 '24 16:08 rusackas

You think that this could be merged soon? What I need to change in the code to fix that issue manually?

shohamyamin avatar Aug 13 '24 14:08 shohamyamin

I also want to use cache-warmup in Superset 4.0.2. When is it planned to include this fix? Thank you!

I installed the version 4.0.2, and applied the fix suggested by @rmasters in the cache.py file. And now it's working.

Before the fix, I was getting the 405 error.

Hi, can you please deliberate on what did you change so it will work for you? I'm encountering the same issue

tamarinkeisari avatar Aug 14 '24 11:08 tamarinkeisari

Yeah... any traction on this would be amazing

jacobeatsspam avatar Aug 20 '24 20:08 jacobeatsspam

Apologies for the radio silence; I don't seem to be getting github notifications through to my email...

Many thanks for the reviews. I'll rebase and pop that refactor to utils in shortly.


@talfour Thanks for providing solution, however this wouldn't work if someone is using superset-websockets as in the headers there are two "Set-Cookie"'s first one is an async-token and second session token, your code will get the first one which is wrong and you'll get 404 UNATHORIZED

I'm not familiar with the websocket server, but am I right in understanding that it should not pick up the async/websocket-server cookie, or does it proxy requests at all? It doesn't appear that the FAB session cookie name can be changed from session so I can filter for that (it has been years since I used FAB though so if anyone knows different please lmk).

Edit: I've popped a change in that parses out all the set-cookie headers and picks the one named session - hopefully this does the trick

rmasters avatar Aug 20 '24 21:08 rmasters

Hey @villebro, I've rebased onto master, split out the csrf-token fetcher and made a fix towards the session cookie grabber, if you can trigger the CI for me please 🚀

rmasters avatar Aug 20 '24 23:08 rmasters

@rmasters thanks for the rebase. Any changes we need to add in terms of ingress annotation for the chart?

fzhan avatar Sep 06 '24 01:09 fzhan

I've started CI and will try to review this tomorrow (PST)

villebro avatar Sep 06 '24 03:09 villebro

Waiting for flaky tests to rerun, will try to merge as soon as CI passes.

villebro avatar Sep 06 '24 16:09 villebro

One last flaky test to go.. 🤦

villebro avatar Sep 06 '24 16:09 villebro

Hi in which version this PR is going to be released?

tamarinkeisari avatar Sep 24 '24 09:09 tamarinkeisari