puppeteer-cluster
puppeteer-cluster copied to clipboard
CONCURRENCY_PAGE - Only the active Tab (page) is automated (the others is pending, on queue) when using
Hello all,
When I am using CONCURRENCY_PAGE
Concurrency implementations I have the following issue.
Only the active tab (page) is automated. The other is waiting for the other to be completed.
Lets say GitHub is the latest triggered automation with the Puppeteer
It is active, and everything for its automation is working, but the other 2, yahoo and google are waiting.
If I switch manually the tab to let's say yahoo
it is starting to working there and the others are pending for it to finish
Below you can see a snippet of my code (adjusted with test names)
const args= [
//https://github.com/puppeteer/puppeteer/issues/1159 || https://github.com/puppeteer/puppeteer/issues/3119
"--disable-setuid-sandbox",
"--enable-automation",
"--disable-browser-side-navigation",
"--test-type",
"--start-maximized",
"--disable-extensions",
"--disable-popup-blocking",
"--disable-infobars",
"--disable-dev-shm-usage",
"--disable-gpu",
"--no-sandbox",
"--disable-features=InfiniteSessionRestore",
"--enable-features=NetworkService"
];
const options = {
headless: false,
w3c: true,
useAutomationExtension: false,
executablePath: 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe',
// executablePath: 'C:/Program Files/Firefox Nightly/firefox.exe',
// product: 'firefox',
args
// acceptInsecureCerts: true,
// ignoreHTTPSErrors: true
};
const cluster_test= await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
monitor: false,
maxConcurrency: 5,
puppeteerOptions: options ,
timeout: 14400000, // 4 hours
workerCreationDelay: 500, // 0.5 seconds delay
});
const automated_process_one= async({ page, data }) => {
return await callFunction
.testFunction(page, data)
.then((result) => {
return result;
})
.catch((error) => {
return cf.errorCatchCustom1(
error.stack,
error.message,
"Error inside cluster.execute:",
"error_log.txt"
);
});
};
app.post("/test", async function(req, res) {
data = req.body["data"];
data.url = `https://test..com_` + data['Test'] + '_' +
try {
let result = await cluster_test.execute(data, automated_process_one);
res.send(result);
} catch (error) {
res.send(
cf.errorCatchCustom1(
error.stack,
error.message,
"Error when calling cluster.execute:",
"error_log.txt"
)
);
}
});
My idea here is that the site that I will automate can't have more than one session, but I will have more than one run with the same credentials (user). I want here to share the data between the browsers (the runs) and this is why I am using CONCURRENCY_PAGE as the other methods are not using shared data at all. I hope someone can assist here, as it is a big issue for me. The main idea is to have them running in parallel, but they are staying in a queue actually.
@thomasdondorf, or someone more experienced here, can you please advise how CONCURRENCY_PAGE is supposed to work ? Is it supposed every page triggered in a browser to be running at the same time or the last started becomes Active one and once it finishes, the other ones are completing one by one (while becoming active) ?
Thank you, I hope it is understandable.
Cheers, Milen
I have the same problem
@galegobr01, Thanks for writing. I was starting to think that only I have this issue :( Let's hope someone will assist here.
@thomasdondorf Is a CONCURRENCY BROWSER option possible without incognito?
@galegobr01, let's hope that @thomasdondorf will check this soon or someone that is better aware of how this works will comment.
@thomasdondorf I thought the main purpose of CONCURRENCY_PAGE was to run multiple pages in parallel. I reached this limitation with puppeteer and thought that this library could bypass this restriction.
What is the goal of CONCURRENCY_PAGE then?
@galegobr01, @grapevineai, thanks for sharing that you have the same issues... @thomasdondorf, do you think that you can check this one and let us know if there is something that you can do, or if we are are doing something wrong (or understood it wrongly).
Thank you, anyway, this library is very useful, but there are some things, that might make it better, like thisone :)
If you run with headless: true
, then this concurrency works as expected...
@deldrid1, thanks, maybe it is so, but then it is coming the other issue I have, which is stopping me to use headless: true
(even if it is all I want :) )
Not selecting certificate with Chrome in Headless mode (not selected at all for Chromium) #5946