puppeteer-sharp icon indicating copy to clipboard operation
puppeteer-sharp copied to clipboard

WaitForNavigationAsync should resolve if Frame already has requested lifecycle

Open igristov opened this issue 4 years ago • 6 comments

I expect that calling:

await page.WaitForNavigationAsync();

would resolve immediately if the page have already collected 'load' lifecycle event before. Also, I expect the following sequence should complete successfully instead of hanging forever as it does currently with version 5.1.0 of puppeteer-sharp:

await page.GoToAsync(url, WaitUntilNavigation.DOMContentLoaded);
await page.WaitForNavigationAsync();

As a workaround for my project I slightly modified FrameManager.cs lines 149-164 to be:

        public async Task<Response> WaitForFrameNavigationAsync(Frame frame, NavigationOptions options = null)
        {
            var timeout = options?.Timeout ?? TimeoutSettings.NavigationTimeout;
            using (var watcher = new LifecycleWatcher(this, frame, options?.WaitUntil, timeout))
            {
                var raceTask = await Task.WhenAny(
                    watcher.NewDocumentNavigationTask,
                    watcher.SameDocumentNavigationTask,
                    watcher.LifecycleTask, // <-- add this line to resolve lifecycle events without waiting for actual navigation
                    watcher.TimeoutOrTerminationTask).ConfigureAwait(false);

                await raceTask.ConfigureAwait(false);

                return watcher.NavigationResponse;
            }
        }

Please check if it may be acceptable fix for public, or maybe a separate method for waiting for lifecycle events without doing actual navigation should be added.

igristov avatar Oct 07 '21 13:10 igristov

The most common pattern is doing something like this:

await Task.WhenAll(
    Page.ClickAsync("a"),
    Page.WaitForNavigationAsync());

I think that with the change you're suggesting, WaitForNavigationAsync will immediately be resolved, breaking that pattern.

kblok avatar Oct 08 '21 12:10 kblok

Then, maybe a separate method like WaitForLifecycleEventAsync() can be introduced? Alternatively, Frame could expose its LifecycleEvents list publicly or provide a method to check if it already contains a specified lifecycle event?

igristov avatar Oct 08 '21 13:10 igristov

If you do await page.GoToAsync(url, WaitUntilNavigation.DOMContentLoaded); that will wait for WaitUntilNavigation.DOMContentLoaded, you won't need an extra wait.

kblok avatar Oct 08 '21 18:10 kblok

Well, maybe I should explain my use case or rather the problem I want to address. I need to regularly monitor a set of websites, load their pages, make some analysis, and take screenshots. To make things fast I run several threads of processing. The problem is that 'load' event is quite often is not caught. Either it is not sent by Chromium under heavy load or there are some problems with websock transport or something else. However, DOMContentLoaded event is 100% caught even under heavy load. I can do all important work right after DOMContentLoaded event, however I obviously would like to take a screenshot only after 'load' event, not earlier. And because 'load' event is not reliable I am ok to wait for it for 10 sec and then proceed with a screenshot anyway. So WaitForLifecycleEventAsync() with its own timeout would be best for me. Hope it helps.

igristov avatar Oct 08 '21 20:10 igristov

All these things you mention are after a GoToAsync or another kind of action?

kblok avatar Oct 11 '21 12:10 kblok

High chances you can reproduce the issue with this simple test:

class Program
{
    static async Task Main(string[] args)
    {
        await new BrowserFetcher().DownloadAsync();
        await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = false });
        var sites = new[] { "news.yahoo.com", "news.google.com", "cnn.com", "wsj.com", "news.bbc.co.uk", "latimes.com" };
        Task.WaitAll(sites.Select(x => Test(browser, "https://" + x)).ToArray());
    }
    static async Task Test(Browser browser, string url)
    {
        await using var page = await browser.NewPageAsync();
        await page.SetRequestInterceptionAsync(true);
        page.Request += async (s, e) => await e.Request.ContinueAsync();
        page.DOMContentLoaded += (s, e) => Debug.WriteLine($"DOMContentLoaded {page.Url} [{url}]");
        page.Load += (s, e) => Debug.WriteLine($"Load {page.Url} [{url}]");
        await page.GoToAsync(url);
    }
}

You'll catch all DOMContentloaded but only few Load events...

igristov avatar Oct 12 '21 06:10 igristov