puppeteer-sharp icon indicating copy to clipboard operation
puppeteer-sharp copied to clipboard

PdfAsync executes indefinitely for a particular URL

Open Bobrovsky opened this issue 3 years ago • 4 comments

Description

The PdfAsync method executes indefinitely for https://css-tricks.com/thispagedoesntexist. I waited for minutes with the default timeout of 30 seconds. At the same time the ScreenshotAsync method has no problems processing the same URL.

Complete minimal example reproducing the issue

E.g.

var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync();
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
await using var page = await browser.NewPageAsync();
await page.GoToAsync("https://css-tricks.com/thispagedoesntexist");
////await page.ScreenshotAsync("out.png");
await page.PdfAsync("out.pdf");

Expected behavior:

I would expect the PdfAsync method to complete in a few seconds (like the ScreenshotAsync does). Or at least throw an exception after 30 seconds.

Actual behavior:

The method keeps executing indefinitely. No exceptions.

Versions

I am using PuppeteerSharp 4.0.0 in a console application, targeting .NET Core 3.1 (netcoreapp3.1).

Bobrovsky avatar May 05 '21 13:05 Bobrovsky

Did you find a workaround for this issue @Bobrovsky ? We're seeing the same thing here and it's causing serious issues tying up our server when a report happens to take too long to process.

WhatFreshHellIsThis avatar Oct 25 '21 23:10 WhatFreshHellIsThis

@WhatFreshHellIsThis No, I never found a workaround for this.

Bobrovsky avatar Oct 26 '21 04:10 Bobrovsky

I also encountered this problem. Do you have any solutions?

kfj1688 avatar Jan 16 '22 04:01 kfj1688

@kfj1688 Yes, I ended up not relying at all on anything to do with the headless browser timing out by coding an entire system to deal with this: when my asp.net core server starts the headless browser to process a pdf render job I record a time stamp and process ID for the browser process started in a ConcurrentBag collection, and when the job is completed and the browser closed I remove the record from the concurrent bag.

Separately in the background as part of my job system there is a recurring job that periodically examines the contents of the ConcurrentBag and looks for processes that are beyond the allowable time limit (user configurable setting) and will kill those jobs if still present by their process id and then remove them from the bag or just remove them from the bag if not present anymore.

This handles the inherent "sloppiness" in the system by making it more determinant and so far works in both Linux and windows.

WhatFreshHellIsThis avatar Jan 17 '22 17:01 WhatFreshHellIsThis