311-data icon indicating copy to clipboard operation
311-data copied to clipboard

Prefect nightly task regularly times out waiting for dashboard to say it's reloading

Open nichhk opened this issue 2 years ago • 4 comments

Overview

See prior context here. This isn't high priority now that we are aware of this bug, but we'd like to have confidence in our error reporting (i.e., if our logs indicate an error, it's actually an error).

Here is where the dashboard handles a reload request.

Here is where the nightly task issues the reload request and waits for the dashboard to indicate that it's reloading.

Action Items

  • [ ] Try increasing the timeout to 2min

nichhk avatar Jun 27 '22 21:06 nichhk

I took a closer look at this. When you manually tell the dashboard server to reload (i.e., visit dashboard_url/reload), it happens almost instantaneously, and we get the result that we're expecting.

So I'm less confident that extending the timeout will solve our issue here. There is probably a bug in the way that we are verifying the reload's success. Currently, prefect opens a browser and navigates to dashboard_url/reload, and then it waits for an HTML component with a specific ID to show up on the page. This seems quite overengineered. Previous context for this here: #935.

I think making a simple web request will be less complicated and work more reliably since we won't need to rely on a browser automation library, so I'll try that. This is what we do to reload the cache on the server.

nichhk avatar Aug 07 '22 17:08 nichhk

Actually, according to #1028, we need to run a browser. There is very little detail in the bug. But apparently they tried using my approach in the previous comment first, but it didn't work. So I will continue investigating the existing solution.

nichhk avatar Aug 07 '22 18:08 nichhk

It doesn't seem to be timing out anymore, probably thanks to #1379. But Prefect still says the reloading is failing, probably because it can't find the reloading message in the page content. I checked the lightsail logs, and the report server is indeed being reloaded, so I'm going to add a print statement to the Prefect task to see what content it's getting from the reload page.

nichhk avatar Oct 24 '22 01:10 nichhk

It timed out again for last night's run. We might need to add a call to waitForNavigation? https://stackoverflow.com/a/58298172

nichhk avatar Oct 25 '22 21:10 nichhk