superset
superset copied to clipboard
Reports are being sent multiple times
On Superset 1.5.1 we have ca. 50 reports enabled that are mostly sent on a daily basis. Sometimes, a report is sent out mutliple times without an apparent reason (s. screenshot of the Report Execution Log), even though the report is set to only send it out once a day (s. screenshot).
This behavior has been seen on different reports on different days, with no apparent pattern. Just randomly, it seems that Superset decides to send out a report several times. The only notable circumstance is that this happens after the scheduled time.
How to reproduce the bug
- Set up a report with a schedule for once a day
Expected results
Each report is sent out according to schedule
Actual results
Some reports are sent out multiple times at random (but all around the scheudled time)
Screenshots
data:image/s3,"s3://crabby-images/ed33e/ed33ec568e035ad1ebea1c4fe9877b47fa72e949" alt="Bildschirmfoto 2023-01-10 um 16 31 47"
data:image/s3,"s3://crabby-images/36cf8/36cf8eeea33da21a074285cf30b23dced5cdad91" alt="Bildschirmfoto 2023-01-09 um 16 38 14"
Environment
(please complete the following information):
- browser type and version: Any browser
- superset version: 1.5.1
Checklist
Make sure to follow these steps before submitting your issue - thank you!
- [ ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
- [ ] I have reproduced the issue with at least the latest released version of superset.
- [x] I have checked the issue tracker for the same issue and I haven't found one similar.
Additional context
Add any other context about the problem here.
@mattitoo It's not related to your problem - but could you share information if you're able to send reports of charts as png with this version of superset ?
@devaale Yes, it works nicely.
@mattitoo By any chance you could share your cloned superset repo as public repository of your own ? I'm having difficulty getting png using API with the master branch.
For organisational reasons I can not do that right now. But we did not implement any changes on the Reporting features or API whatsoever.
Mhm - understandable, no changes required in superset_config.py file either - in order for API endpoint api/v1/chart/{pk}/cache_screenshot or reporting when sending chart as .PNG to work, right ? @mattitoo Thank you - i'll try out 1.5.1 version.
I am facing this issue too. Some day it sent out duplicately and some day did not sent
We found out that is a timing issue when generating reports. Basically, every 60 seconds there is a check if a report is scheduled but not finished yet. But, it is not taken into account whether the report was triggered, but not finished yet. So, if reports take too long, they are not registered as finished, and the report is triggered again. Then, the first report is sent out (because it is finished now) and the second one as well.
@mattitoo Could you please suggest? Do you have any idea how to fix it?
Is this related to caching timeout in redis/celery worker? I found some related issue about celery worker that duplicate task https://github.com/celery/celery/issues/3270 and there is some comment suggested to extend VISIBILITY_TIMEOUT config of celery but I'm not sure where we could apply it in Superset
We just upgraded to Superset 2.0.1 and wanted to see if the problem persists there. If it does, we will have a look at a possible fix.
@mattitoo we are facing same issue. did upgrade to 2.0.1 solve your problem?
No, unfortunately this still happens.
For us, This was caused by value of visibility timeout being lower than time taken by task to complete the job. Increasing the sqs queue's visibility timeout stopped the duplicates.
Hi, we also tried extend VISIBILITY_TIMEOUT for celery and it resolved this issue. The report stop duplicating or skipping. Thank you @mdeshmu for your suggestion and thank you all for discussion on this issue. Cheers!
I'm tempted to close this as completed based on what I'm reading... is there anything that needs to be added to the docs and/or comments in config files so we can rest easier about doing so?
Hey @rusackas, as @unnyns-307 mentioned, we resolved this issue by appending the following line to the config.py file: broker_transport_options = {'visibility_timeout': 18000}. The snippet below might be helpful for other users experiencing the same issue.
class CeleryConfig(object):
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
broker_transport_options = {'visibility_timeout': 18000}
imports = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
Hi @rusackas , @unnyns-307 , @zhaoyongjie , can someone help me?
I'm use as @unnyns-307 mentioned, but it does not resolve this issue.
broker_transport_options = {'visibility_timeout': 18000}
please help.
I looked at the Celery docs on Visibility Timeout
and it says the default value is one hour. I take that to mean that you would only encounter the problematic behavior in this issue if your reports sometimes take over an hour to run -- can anyone confirm that was the situation or give a runtime value for one of the reports that was being sent in a loop?
I think this should be added to the Alerts & Reports documentation, but not the default config.py
. One hour seems like a reasonable default. The "caveats" section of those docs say that increasing this value can have the negative side effect of reports being excessively delayed if Celery is restarted.
Would anyone in this thread be willing to contribute a brief PR to the Alerts & Reports docs page so we can close this? I think it should be a commented-out line of code in the config showing how to set a valid value in a way that works with the current version of Superset, and then another comment above it saying something like:
# if you have long-running reports that are being resent in a loop, extend the visibility timeout per https://github.com/apache/superset/issues/22664"