puppeteer-cluster icon indicating copy to clipboard operation
puppeteer-cluster copied to clipboard

CONCURRENCY_PAGE - Only the active Tab (page) is automated (the others is pending, on queue) when using

Open mpalavrov opened this issue 4 years ago • 9 comments

Hello all,

When I am using CONCURRENCY_PAGE Concurrency implementations I have the following issue.

Only the active tab (page) is automated. The other is waiting for the other to be completed. Lets say GitHub is the latest triggered automation with the Puppeteer image It is active, and everything for its automation is working, but the other 2, yahoo and google are waiting. If I switch manually the tab to let's say yahoo image it is starting to working there and the others are pending for it to finish

Below you can see a snippet of my code (adjusted with test names)

const args= [
    //https://github.com/puppeteer/puppeteer/issues/1159 || https://github.com/puppeteer/puppeteer/issues/3119
    "--disable-setuid-sandbox",
    "--enable-automation",
    "--disable-browser-side-navigation",
    "--test-type",
    "--start-maximized",
    "--disable-extensions",
    "--disable-popup-blocking",
    "--disable-infobars",
    "--disable-dev-shm-usage",
    "--disable-gpu",
    "--no-sandbox",
    "--disable-features=InfiniteSessionRestore",
    "--enable-features=NetworkService"
];

const options = {
    headless: false,
    w3c: true,
    useAutomationExtension: false,
    executablePath: 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe',
    // executablePath: 'C:/Program Files/Firefox Nightly/firefox.exe',
    // product: 'firefox',
    args
    // acceptInsecureCerts: true,
    // ignoreHTTPSErrors: true
};

   const cluster_test= await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_PAGE,
        monitor: false,
        maxConcurrency: 5,
        puppeteerOptions: options ,
        timeout: 14400000, // 4 hours
        workerCreationDelay: 500, // 0.5 seconds delay
    });


    const automated_process_one= async({ page, data }) => {
        return await callFunction
            .testFunction(page, data)
            .then((result) => {
                return result;
            })
            .catch((error) => {
                return cf.errorCatchCustom1(
                    error.stack,
                    error.message,
                    "Error inside cluster.execute:",
                    "error_log.txt"
                );
            });
    };

app.post("/test", async function(req, res) {
        data = req.body["data"];
                data.url = `https://test..com_` + data['Test'] + '_' +
        try {
            let result = await cluster_test.execute(data, automated_process_one);
            res.send(result);
        } catch (error) {
            res.send(
                cf.errorCatchCustom1(
                    error.stack,
                    error.message,
                    "Error when calling cluster.execute:",
                    "error_log.txt"
                )
            );
        }
    });

My idea here is that the site that I will automate can't have more than one session, but I will have more than one run with the same credentials (user). I want here to share the data between the browsers (the runs) and this is why I am using CONCURRENCY_PAGE as the other methods are not using shared data at all. I hope someone can assist here, as it is a big issue for me. The main idea is to have them running in parallel, but they are staying in a queue actually.

mpalavrov avatar Jun 05 '20 12:06 mpalavrov

@thomasdondorf, or someone more experienced here, can you please advise how CONCURRENCY_PAGE is supposed to work ? Is it supposed every page triggered in a browser to be running at the same time or the last started becomes Active one and once it finishes, the other ones are completing one by one (while becoming active) ?

Thank you, I hope it is understandable.

Cheers, Milen

mpalavrov avatar Jun 13 '20 12:06 mpalavrov

I have the same problem

galegobr01 avatar Jul 29 '20 18:07 galegobr01

@galegobr01, Thanks for writing. I was starting to think that only I have this issue :( Let's hope someone will assist here.

mpalavrov avatar Jul 30 '20 11:07 mpalavrov

@thomasdondorf Is a CONCURRENCY BROWSER option possible without incognito?

galegobr01 avatar Jul 30 '20 11:07 galegobr01

@galegobr01, let's hope that @thomasdondorf will check this soon or someone that is better aware of how this works will comment.

mpalavrov avatar Aug 16 '20 07:08 mpalavrov

@thomasdondorf I thought the main purpose of CONCURRENCY_PAGE was to run multiple pages in parallel. I reached this limitation with puppeteer and thought that this library could bypass this restriction.

What is the goal of CONCURRENCY_PAGE then?

rileyai-dev avatar Aug 26 '20 11:08 rileyai-dev

@galegobr01, @grapevineai, thanks for sharing that you have the same issues... @thomasdondorf, do you think that you can check this one and let us know if there is something that you can do, or if we are are doing something wrong (or understood it wrongly).

Thank you, anyway, this library is very useful, but there are some things, that might make it better, like thisone :)

mpalavrov avatar Sep 04 '20 06:09 mpalavrov

If you run with headless: true, then this concurrency works as expected...

deldrid1 avatar Sep 17 '20 19:09 deldrid1

@deldrid1, thanks, maybe it is so, but then it is coming the other issue I have, which is stopping me to use headless: true (even if it is all I want :) ) Not selecting certificate with Chrome in Headless mode (not selected at all for Chromium) #5946

mpalavrov avatar Sep 18 '20 10:09 mpalavrov