puppeteer-sharp icon indicating copy to clipboard operation
puppeteer-sharp copied to clipboard

The time allocated for FrameManager to initialize is not enough (1 second)

Open AleksandrsJakovlevsVisma opened this issue 1 year ago • 5 comments

Description

We are trying to use Puppeteer-Shart to convert HTML to PDF. Sometimes we could get 200 conversion requests simultaneously. At first, we tried to create a browser and a page for each request but quickly realized that this was a bad idea performance-wise. Our new strategy is to use a pool of pages which means we always use one browser and allow a maximum of 10 pages to be opened.

Still, it does not work. At some point, when a new page is created, we get this exception:

PuppeteerSharp.TargetClosedException: Protocol error (Performance.enable): Session closed. Most likely the Page has been closed.Close reason: Connection failed to process Page.frameStoppedLoading. Timeout of 1000 ms exceeded.

Coupled with stuff like this:

PuppeteerSharp.TargetClosedException: Protocol error(Target.createTarget): Target closed. (Connection failed to process {\""id\"":14,\""result\"":{},\""sessionId\"":\""BED668EF013F206DE76E55157C4B06BC\""}. Object reference not set to an instance of an object.

Reading the code I guess the following is happening:

  1. With such a high number of requests being done at the same time, the server performance is slowed down
  2. At some point, when creating a new page, FrameManager.InitializeAsync is not able to be completed within one second
  3. The connection is then closed with the "Timeout of 1000 ms exceeded" error message
  4. The rest of the requests fail since the browser they were using has a closed connection

Complete minimal example reproducing the issue

This is an approximation of what the pool looks like. Calling GetPageAsync in multiple threads will eventually lead to the above-mentioned exceptions.

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using PuppeteerSharp;

namespace Test;

public class ChromiumPagePool() : IDisposable
{
    private readonly SemaphoreSlim _destroyPageSemaphore = new(1, 1);
    private readonly ConcurrentQueue<IPage> _pages = [];
    private IBrowser? _browser;
    private SemaphoreSlim? _currentPageCount;
    private int _destroyPageTasks;
    private bool _disposing;

    public async Task InitAsync()
    {
        _browser = await Puppeteer.LaunchAsync(new LaunchOptions
        {
            ExecutablePath = @"C:\Users\aleksandrs.jakovlevs\Downloads\chrome-win\chrome-win\chrome.exe",
            Browser = SupportedBrowser.Chromium,
            Args =
            [
                "--no-sandbox"
            ]
        });

        for (var i = 0; i < 3; i++)
        {
            var page = await _browser.NewPageAsync();
            _pages.Enqueue(page);
        }

        _currentPageCount = new SemaphoreSlim(7, 10);
    }

    public async Task<IPage?> GetPageAsync(CancellationToken token)
    {
        var res = _pages.TryDequeue(out var page);

        if (res && page != null)
        {
            return page;
        }

        await _currentPageCount.WaitAsync(token);

        page = await _browser.NewPageAsync();

        return page;
    }

    public void QueuePageDestruction(IPage page, CancellationToken token)
    {
        var pageToDestroy = page;
        var cancelToken = token;
        Task.Run(() => DestroyPage(pageToDestroy, cancelToken), token);
    }

    private async Task DestroyPage(IPage page, CancellationToken cancelToken)
    {
        Interlocked.Increment(ref _destroyPageTasks);

        if (_disposing)
        {
            page.Dispose();
            return;
        }

        try
        {
            page.Dispose();

            await _destroyPageSemaphore.WaitAsync(cancelToken);

            try
            {
                if (10 - _currentPageCount.CurrentCount <= 3)
                {
                    page = await _browser.NewPageAsync();
                    _pages.Enqueue(page);
                }
                else
                {
                    _currentPageCount.Release();
                }
            }
            finally
            {
                _destroyPageSemaphore.Release();
            }
        }
        finally
        {
            Interlocked.Decrement(ref _destroyPageTasks);
        }
    }

    public void Dispose()
    {
        _disposing = true;

        while (_destroyPageTasks > 0)
        {
            Thread.Sleep(50);
        }

        foreach (var page in _pages.Where(p => !p.IsClosed))
        {
            page.Dispose();
        }

        if (_browser?.IsConnected == true)
        {
            _browser.Dispose();
        }

        _destroyPageSemaphore.Dispose();
        _currentPageCount?.Dispose();

        GC.SuppressFinalize(this);
    }

    ~ChromiumPagePool()
    {
        Dispose();
    }
}

Expected behavior:

All 200 requests successfully convert HTML to PDF

Actual behavior:

At some point conversion fails

Versions

  • 16.0.0
  • .NET 8.0

@AleksandrsJakovlevsVisma Are you willing to test some custom build to confirm the fix?

kblok avatar May 06 '24 00:05 kblok

@kblok Sorry for the delay, I was on vacation.

Yes, I don't mind trying a custom build

@AleksandrsJakovlevsVisma I'm exposing the TaskHelper timeout here. Can you see if increasing the timeout there helps?

That's an static class that will affect the entire library.

kblok avatar May 13 '24 20:05 kblok

@kblok I did the following:

  1. Downloaded that branch as zip and referenced the csproj in my proj
  2. I also did one additional change seen below image
  3. In my code I am setting DefaultTimeout to 30 secs
  4. During testing I am no longer seeing TargetClosedException, but now I am getting 504 gate-way timeouts from the API that is using Puppeteer

@kblok I did the following:

  1. I also did one additional change seen below

Good catch

  1. During testing I am no longer seeing TargetClosedException, but now I am getting 504 gate-way timeouts from the API that is using Puppeteer

That would be something on your side right?

kblok avatar May 15 '24 17:05 kblok

@kblok

Thank you for helping out with this one.

It is the problem on our side, yes. The PDF conversion on average takes 6 seconds, but the request times out after a minute. 200 requests with a max count of simultaneously opened pages = 10 is not enough to make sure each of 200 finishes in a minute. Do you have any advice on how to speed up a simple HTML to PDF conversion? Perhaps there are some magic LaunchOptions args?

In any case, can we expect the configurable default timeout released in the near future?

@kblok

Thank you for helping out with this one.

It is the problem on our side, yes. The PDF conversion on average takes 6 seconds, but the request times out after a minute. 200 requests with a max count of simultaneously opened pages = 10 is not enough to make sure each of 200 finishes in a minute. Do you have any advice on how to speed up a simple HTML to PDF conversion? Perhaps there are some magic LaunchOptions args?

PDF generate is quite resource intensive. If you look at the code, most of the time is on the browser side.

In any case, can we expect the configurable default timeout released in the near future?

I can ship it today :)

kblok avatar May 20 '24 16:05 kblok

@kblok

I can ship it today :)

Nice, we will look forward to it, whenever you'll have the time :)

I think I'm running into this same issue. Can one of your share an example of how to adjust the timeout using @kblok's code change?

kemmis avatar Aug 05 '24 14:08 kemmis

@kemmis

Add PuppeteerSharp.Helpers.TaskHelper.DefaultTimeout = 5_000; somewhere in your initialization code. Replace 5_000 with your desired timeout