superset icon indicating copy to clipboard operation
superset copied to clipboard

Alerts & Reports - CSV as Message Content resulting HTTP 500

Open nivdann opened this issue 1 year ago • 1 comments

Bug description

I managed to enable my Alerts & Reports as mentioned in documentation, BUT when I try to use the chart report as CSV to send to email, it fails with HTTP 500 as shown in the worker logs.

Alerts and Snapshots work and are sent to my email as expected. The only thing that doesn't work and fails is CSV charts.

How to reproduce the bug

  1. Go to Alerts & Reports
  2. Add new alert (i added a "select 1;" query with trigger condition != 0 to have the trigger, and also choose a example chart with CSV option)
  3. Wait for trigger

Screenshots/recordings

image image

Superset version

3.1.3

Python version

3.10

Node version

I don't know

Browser

Not applicable

Additional context

I using helm to deploy to my EKS running AWS. Using Firefox and geckodriver (defaults).

Superset Config

 ROW_LIMIT = 5000
 ENABLE_PROXY_FIX = True
 ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
FEATURE_FLAGS = {
     "DYNAMIC_PLUGINS": True,
     "ALLOW_ADHOC_SUBQUERY": True,
     "ENABLE_TEMPLATE_PROCESSING": True,
     "DASHBOARD_VIRTUALIZATION": True,
     "DRILL_BY": True,
     "GLOBAL_ASYNC_QUERIES": False,
     "ALERT_REPORTS": True
}
SMTP_HOST = os.getenv("SMTP_HOST")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL"))
SMTP_SSL_SERVER_AUTH = ast.literal_eval(os.getenv("SMTP_SSL_SERVER_AUTH"))
SMTP_PORT = os.getenv("SMTP_PORT")
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM")
SMTP_USER = ""
SMTP_PASSWORD = ""


from celery.schedules import crontab

class CeleryConfig:
    broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
    imports = (
        "superset.sql_lab",
        "superset.tasks.cache",
        "superset.tasks.scheduler",
    )
    result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
    worker_prefetch_multiplier = 10
    task_acks_late = True
    task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
    }
    beat_schedule = {
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=0, hour=0),
        },
    }
CELERY_CONFIG = CeleryConfig


LOG_LEVEL = 'DEBUG'
SCREENSHOT_LOCATE_WAIT = 100
SCREENSHOT_LOAD_WAIT = 600
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}"
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://superset:8088"


SUPERSET_WEBSERVER_TIMEOUT = 300


from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities.FIREFOX.copy()
capabilities['proxy'] = {
    "proxyType": "DIRECT" # Direct connection, no proxy
}
WEBDRIVER_CONFIGURATION = {"capabilities": capabilities}
WEBDRIVER_OPTION_ARGS = ["--headless"]

SECRET_KEY = os.getenv("SUPERSET_SECRET_KEY")

Worker logs

[2024-05-19 10:58:00,074: INFO/ForkPoolWorker-1] Scheduling alert test eta: 2024-05-19 10:58:00 [2024-05-19 10:58:00,081: DEBUG/ForkPoolWorker-1] [stats_logger] (incr) reports.execute [2024-05-19 10:58:00,081: INFO/ForkPoolWorker-1] Executing alert/report, task id: 39937d56-4280-4e16-bde8-011841a62d90, scheduled_dttm: 2024-05-19T10:58:00 [2024-05-19 10:58:00,082: INFO/ForkPoolWorker-1] session is validated: id 15, executionid: 39937d56-4280-4e16-bde8-011841a62d90 [2024-05-19 10:58:00,149: INFO/ForkPoolWorker-1] Running report schedule 39937d56-4280-4e16-bde8-011841a62d90 as user admin [2024-05-19 10:58:00,385: DEBUG/ForkPoolWorker-1] Parsing with sqlparse statement: select 1; [2024-05-19 10:58:00,394: DEBUG/ForkPoolWorker-1] Database._get_sqla_engine(). Masked URL: awsathena+rest://athena.us-west-2.amazonaws.com/superset_db?s3_staging_dir=s3%3A%2F%2Fdata%2Fquery_results%2F&work_group=primary [2024-05-19 10:58:00,397: DEBUG/ForkPoolWorker-1] Database._get_sqla_engine(). Masked URL: awsathena+rest://athena.us-west-2.amazonaws.com/superset_db?s3_staging_dir=s3%3A%2F%2Fdata%2Fquery_results%2F&work_group=primary [2024-05-19 10:58:01,943: INFO/ForkPoolWorker-1] Query for test took 1552.98 ms [2024-05-19 10:58:01,963: INFO/ForkPoolWorker-1] Getting chart from http://superset:8088/api/v1/chart/2/data/?format=csv&type=post_processed&force=true as user admin [2024-05-19 10:58:02,074: INFO/ForkPoolWorker-1] header_data in notifications for alerts and reports {'notification_type': 'Alert', 'notification_source': <ReportSourceFormat.CHART: 'chart'>, 'notification_format': 'CSV', 'chart_id': 2, 'dashboard_id': None, 'owners': [Superset Admin]}, taskid, 39937d56-4280-4e16-bde8-011841a62d90 [2024-05-19 10:58:02,164: DEBUG/ForkPoolWorker-1] Sent an email to ['[email protected]'] [2024-05-19 10:58:02,352: INFO/ForkPoolWorker-1] Report sent to email, notification content is None [2024-05-19 10:58:02,352: DEBUG/ForkPoolWorker-1] [stats_logger] (gauge) reports.email.send.ok1 [2024-05-19 10:58:02,376: ERROR/ForkPoolWorker-1] A downstream exception occurred while generating a report: 39937d56-4280-4e16-bde8-011841a62d90. Failed generating csv HTTP Error 500: Internal Server Error Traceback (most recent call last): File "/app/superset/commands/report/execute.py", line 258, in _get_csv_data csv_data = get_chart_csv_data(chart_url=url, auth_cookies=auth_cookies) File "/app/superset/utils/csv.py", line 92, in get_chart_csv_data response = opener.open(chart_url) File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open response = meth(req, response) File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response response = self.parent.error( File "/usr/local/lib/python3.10/urllib/request.py", line 563, in error return self._call_chain(*args) File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain result = func(*args) File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 500: Internal Server Error

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/superset/tasks/scheduler.py", line 98, in execute ).run() File "/app/superset/commands/report/execute.py", line 729, in run raise ex File "/app/superset/commands/report/execute.py", line 727, in run ).run() File "/app/superset/commands/report/execute.py", line 689, in run ).next() File "/app/superset/commands/report/execute.py", line 586, in next raise first_ex File "/app/superset/commands/report/execute.py", line 555, in next self.send() File "/app/superset/commands/report/execute.py", line 452, in send notification_content = self._get_notification_content() File "/app/superset/commands/report/execute.py", line 364, in _get_notification_content csv_data = self._get_csv_data() File "/app/superset/commands/report/execute.py", line 262, in _get_csv_data raise ReportScheduleCsvFailedError( superset.commands.report.exceptions.ReportScheduleCsvFailedError: Failed generating csv HTTP Error 500: Internal Server Error

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

nivdann avatar May 19 '24 11:05 nivdann

The HTTP 500 error you're encountering when trying to send chart data as CSV via email in Apache Superset seems to be related to the process of fetching and generating the CSV data from the chart URL. Here are some targeted steps to troubleshoot and potentially resolve this issue:

  1. Verify Chart URL Accessibility: Ensure the chart URL is accessible and that any required authentication is correctly handled. The error could stem from issues in fetching data due to authentication or accessibility problems.

  2. Data Handling in CSV Generation: Check the process of converting chart data to a pandas DataFrame and the subsequent CSV conversion. Ensure that this process does not introduce errors, especially with complex data types or large datasets.

  3. CSV Content and Email Attachment: Confirm that the generated CSV content is correctly formatted and encoded for email attachment. Malformed or empty CSV data could lead to failures in the email sending process.

  4. Logging and Error Handling: Enhance logging around the CSV generation and email sending process to capture detailed information about the failure. This can provide insights into what part of the process is causing the HTTP 500 error.

  5. Configuration and Permissions: Review Superset and server configurations related to CSV export and email sending. Ensure there are no misconfigurations or permission issues that could be causing the problem.

  6. Superset Updates: Make sure your Superset installation is up to date. If this issue is known, it might have been addressed in a more recent version of Superset.

  7. Integration Tests for Insights: The tests/integration_tests/reports/commands_tests.py file in the Superset codebase contains tests related to report scheduling and sending, including CSV content. Reviewing these tests might offer clues to resolving your issue.

For further assistance, you might want to follow the discussion Alerts & Reports - CSV as Message Content resulting HTTP 500 for any updates or solutions from the community.

Details

Sources


About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 19 '24 11:05 dosubot[bot]

Hello, did you find a solution ? we are facing the same issue.

many thanks G

gaful avatar Jun 06 '24 13:06 gaful

Hello to add a little more details :

We also faced some issue like that

  1. with large csv files (>10M)
  2. with charts containing null value -> I don't know why the charts export works fine, but the reports doesn't work when charts contains null value.

For our case :

  • Reports were working on 3.1 version
  • We are facing the issue since 4.0.1 migration
  • The issue is only with csv file, while the same charts works perfectly fine in txt , it is not a charts issue size.

cdlt G

gaful avatar Jun 06 '24 16:06 gaful

I'm experiencing this also, under 4.0.0. I can send chart reports as a PNG image, but sending as .csv fails with HTTP Error 500: INTERNAL SERVER ERROR. Here is a stack trace from the worker:

A downstream exception occurred while generating a report: 60e1a3b3-dace-4db0-a080-47905e5bec3a. Failed generating csv HTTP Error 500: INTERNAL SERVER ERROR
Traceback (most recent call last):
  File "/app/superset/commands/report/execute.py", line 256, in _get_csv_data
    csv_data = get_chart_csv_data(chart_url=url, auth_cookies=auth_cookies)
  File "/app/superset/utils/csv.py", line 92, in get_chart_csv_data
    response = opener.open(chart_url)
  File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR

sfirke avatar Jun 12 '24 16:06 sfirke

I managed to resolve this issue by removing the http proxy environment variable that was set in my values.yaml as part of the command for my installations need. The problem was that after I set a proxy as an environment variable in that step and don't remove it, at the end of my installations of the command(Under supersetWorker/supersetNode), it calls to start the app as follows: . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker

so the environment variable (my http proxy) is set also for the app and when it tries to use the WEBDRIVER_BASEURL or the URL to the service, it tries to use proxy and not directly - therefore it gets a timeout/error.

Also after figuring that out, I removed from this from my values.yaml since its not really needed:

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities.FIREFOX.copy()
capabilities['proxy'] = {
    "proxyType": "DIRECT" # Direct connection, no proxy
}
WEBDRIVER_CONFIGURATION = {"capabilities": capabilities}

Eventually all screenshots, csv, worked as expected for my case (sounds like yours is different)

Hope it helps :)

nivdann avatar Jun 18 '24 14:06 nivdann