superset
superset copied to clipboard
Report Celery Job is always scheduled 12 hours ahead
Bug description
Problem is the title. No matter what timezone I select, the report job gets scheduled 12 hours ahead.
In the Superset UI, the timezone display on the created report job is always different than what I selected in the UI (which is typically, GMT -6):
In flower, I can see the jobs and their ETA looks correct but the job doesn't get executed then, but exactly 12 hours later:
I'm running Superset/Celery on a Ubuntu 22.04, Python 3.11. Not in docker, just directly.
Here's my celery config:
class CeleryConfig:
"""
Celery worker configuration
"""
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
imports = (
"superset.sql_lab",
"superset.tasks.scheduler",
)
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
worker_prefetch_multiplier = 10
task_protocol = 2
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
"tasks.add": {
"rate_limit": "10/s",
},
"email_reports.send": {
"rate_limit": "1/s",
"time_limit": 120,
"soft_time_limit": 150,
"ignore_result": True,
},
}
beat_schedule = {
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute="0", hour="0"),
},
}
timezone = "America/Denver"
enable_utc = False
How to reproduce the bug
Turn on Alerts/reports.
Add new report, select GMT -6 as the timezone. Create the job. See in the worker logs that the job wont execute for another 12 hours.
Screenshots/recordings
Celery worker logs:
Superset version
3.1.2
Python version
3.11
Node version
Not applicable
Browser
Chrome
Additional context
No response
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
hi @nathan-gilbert you found some solutión about this problem? i have the same issue, thanks for you comment
@tridoxx No, I have found no solution yet.
It looks like the issue you're experiencing with Celery jobs in Superset being scheduled 12 hours ahead could be due to a timezone misalignment, particularly in how timezones are handled and converted within Superset's scheduling logic. This misalignment might stem from the timezone setting in the Superset UI not being correctly recognized or applied when calculating the next execution times for jobs.
To address this, ensure that the timezone you set in the Superset UI is correctly formatted and recognized by the pytz
library, as any incorrect or unrecognized timezone will default to UTC. Additionally, you might need to review and possibly customize the scheduling logic in Superset, especially if the default behavior doesn't align with your expectations. This could involve adjusting how datetime values are localized to the specified timezone and ensuring accurate conversion back to UTC if necessary.
There's also a related issue and suggested solution that involves replacing datetime.utcnow()
with datetime.now(timezone.utc)
in the scheduler code, which has been discussed in the Superset GitHub repository. This change aims to correct the timezone handling and might be a good starting point for resolving the scheduling discrepancies you're facing.
If after these steps you're still encountering issues, expanding logging to include key variables like triggered_at
, start_at
, stop_at
, and the final scheduled times can help identify where the mismatches occur, providing further insight into the problem and aiding in troubleshooting.
Details
Sources
- Timezone Misalignment in Apache Superset 3.0.1 Alert&Reports
- superset/tasks/scheduler.py
- superset/tasks/cron_util.py
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
hi @nathan-gilbert i did this correctión, first you need to modify the file "superset/task/schedule.py" in my case, i am using docker so, for some strange reason, i need to modify this file inside the "superset_worker" container, not the "superset_app" container. so i edit the to this.
`import pytz utc_now = datetime.utcnow()
and change the line
#async_options = {"eta": schedule}
to
async_options = {"eta": utc_now} `
something like this.
and reboot the machine. for me this work, the superset wil take the real utc time, and subtract the correct UTC time from the timezone defined directly by the alert generator on the superset app.
for me is working now without problem, check that is correct using the crontab on superset, to execute the alert every minute, and use the next comand to check logs "docker logs superset_worker --since 1h" if you are using the docker "superset_worker"