puppeteer-sharp
puppeteer-sharp copied to clipboard
WaitForNavigationAsync should resolve if Frame already has requested lifecycle
I expect that calling:
await page.WaitForNavigationAsync();
would resolve immediately if the page have already collected 'load' lifecycle event before. Also, I expect the following sequence should complete successfully instead of hanging forever as it does currently with version 5.1.0 of puppeteer-sharp:
await page.GoToAsync(url, WaitUntilNavigation.DOMContentLoaded);
await page.WaitForNavigationAsync();
As a workaround for my project I slightly modified FrameManager.cs lines 149-164 to be:
public async Task<Response> WaitForFrameNavigationAsync(Frame frame, NavigationOptions options = null)
{
var timeout = options?.Timeout ?? TimeoutSettings.NavigationTimeout;
using (var watcher = new LifecycleWatcher(this, frame, options?.WaitUntil, timeout))
{
var raceTask = await Task.WhenAny(
watcher.NewDocumentNavigationTask,
watcher.SameDocumentNavigationTask,
watcher.LifecycleTask, // <-- add this line to resolve lifecycle events without waiting for actual navigation
watcher.TimeoutOrTerminationTask).ConfigureAwait(false);
await raceTask.ConfigureAwait(false);
return watcher.NavigationResponse;
}
}
Please check if it may be acceptable fix for public, or maybe a separate method for waiting for lifecycle events without doing actual navigation should be added.
The most common pattern is doing something like this:
await Task.WhenAll(
Page.ClickAsync("a"),
Page.WaitForNavigationAsync());
I think that with the change you're suggesting, WaitForNavigationAsync will immediately be resolved, breaking that pattern.
Then, maybe a separate method like WaitForLifecycleEventAsync() can be introduced? Alternatively, Frame could expose its LifecycleEvents list publicly or provide a method to check if it already contains a specified lifecycle event?
If you do await page.GoToAsync(url, WaitUntilNavigation.DOMContentLoaded); that will wait for WaitUntilNavigation.DOMContentLoaded, you won't need an extra wait.
Well, maybe I should explain my use case or rather the problem I want to address. I need to regularly monitor a set of websites, load their pages, make some analysis, and take screenshots. To make things fast I run several threads of processing. The problem is that 'load' event is quite often is not caught. Either it is not sent by Chromium under heavy load or there are some problems with websock transport or something else. However, DOMContentLoaded event is 100% caught even under heavy load. I can do all important work right after DOMContentLoaded event, however I obviously would like to take a screenshot only after 'load' event, not earlier. And because 'load' event is not reliable I am ok to wait for it for 10 sec and then proceed with a screenshot anyway. So WaitForLifecycleEventAsync() with its own timeout would be best for me. Hope it helps.
All these things you mention are after a GoToAsync or another kind of action?
High chances you can reproduce the issue with this simple test:
class Program
{
static async Task Main(string[] args)
{
await new BrowserFetcher().DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = false });
var sites = new[] { "news.yahoo.com", "news.google.com", "cnn.com", "wsj.com", "news.bbc.co.uk", "latimes.com" };
Task.WaitAll(sites.Select(x => Test(browser, "https://" + x)).ToArray());
}
static async Task Test(Browser browser, string url)
{
await using var page = await browser.NewPageAsync();
await page.SetRequestInterceptionAsync(true);
page.Request += async (s, e) => await e.Request.ContinueAsync();
page.DOMContentLoaded += (s, e) => Debug.WriteLine($"DOMContentLoaded {page.Url} [{url}]");
page.Load += (s, e) => Debug.WriteLine($"Load {page.Url} [{url}]");
await page.GoToAsync(url);
}
}
You'll catch all DOMContentloaded but only few Load events...