puppeteer-cluster
puppeteer-cluster copied to clipboard
Proxies
Is it possible to use different proxies per browser instance? I want to create something so that every instance has a different proxy through which the browser will retrieve information.
you can pass in browser configs via puppeteerOptions.
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 2,
puppeteerOptions: {
headless: false,
devTools: true,
ignoreHTTPSErrors: true,
timeout: 0,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--window-size=1920,1080',
'--proxy-server=http://localhost:8888'
],
ignoreDefaultArgs: ['--enable-automation']
}
});
+1, I also would like to use different proxy to each browser instance. @Yannicko have you found a solution?
I found the solution literally a few searches later after I posted the comment.
Anyway anyone else looking for a solution use this: proxy-per-page
Somehow it doesn't work for me :/
I tried puppeteerOptions
and perBrowserOptions
individually and together at the same time and the proxy is completely ignored.
I tried
puppeteerOptions
andperBrowserOptions
individually and together at the same time and the proxy is completely ignored.
I experience the same behavior. Have you found a solution to this yet?
Edit: It's a bit late, but I found a solution to this problem if you are using a proxy-server. Please continue reading:
First of all I created a new Concurrency by copying the Browser
-Concurrency and renamed it to BrowserProxy
.
Then I changed the code in the workerInstance
to check if the options contain the --proxy-server
argument like this:
class BrowserProxy extends ConcurrencyImplementation_1.default {
...
let page;
let context; // puppeteer typings are old...
const proxyServer = options.args.find(arg => arg.includes('--proxy-server=')).split('--proxy-server=')[1] || null;
const contextOptions = {proxyServer: proxyServer ? proxyServer : null};
return {
jobInstance: () => __awaiter(this, void 0, void 0, function* () {
...
If so the proxy-server value will be saved and provided to the createIncognitoBrowserContext
like this:
...
jobInstance: () => __awaiter(this, void 0, void 0, function* () {
yield util_1.timeoutExecute(BROWSER_TIMEOUT, (() => __awaiter(this, void 0, void 0, function* () {
context = yield chrome.createIncognitoBrowserContext(contextOptions);
page = yield context.newPage();
}))());
return {
...
After that make changes to all the Concurrency files so your Concurrency can be used by puppeteer-cluster like this:
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_BROWSERPROXY,
maxConcurrency: 1,
timeout: properties.taskTimeout,
puppeteerOptions: {
headless: false,
ignoreHTTPSErrors: true,
args: [
`--proxy-server=${proxy_server}`,
'--no-sandbox',
]
},
puppeteer: puppeteer,
monitor: false,
retryLimit: 3,
retryDelay: 3500
});
There is probably a better way to handle that, but this was my first approach in fixing this issue. Let me know if that helped you in any way.
@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:
I created a proxyServer variable with the proxy server for this instance:
const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;
and i changed context to:
context = await chrome.createIncognitoBrowserContext({ proxyServer });
@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:
I created a proxyServer variable with the proxy server for this instance:
const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;
and i changed context to:
context = await chrome.createIncognitoBrowserContext({ proxyServer });
That's another way to do it. Good you found it too 👍
Hi all,
I have found a solution for those who may have been struggling with a lack of proxy support per request or per browser in puppeteer-cluster
. I was able to achieve this by utilising the proxy-per-page package.
I hope this solution helps others in a similar situation. Please see the example code below for implementation details.
proxies.json
[
"xxx.xxx.xxx.xxx:xxxxx",
"xxx.xxx.xxx.xxx:xxxxx",
"xxx.xxx.xxx.xxx:xxxxx",
"xxx.xxx.xxx.xxx:xxxxx",
"xxx.xxx.xxx.xxx:xxxxx",
"xxx.xxx.xxx.xxx:xxxxx"
]
index.ts
import { Cluster } from 'puppeteer-cluster';
import ProxyList from '../proxies.json';
import useProxy from 'puppeteer-page-proxy';
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
monitor: true,
puppeteerOptions: {
headless: false
}
});
await cluster.task(async ({page, data: url}) => {
await useProxy(page, `direct://${getProxy()}`);
await page.goto(url);
});
cluster.queue('https://ipinfo.io');
cluster.queue('https://ipinfo.io');
await cluster.idle();
await cluster.close();
})();
function getProxy() {
return ProxyList[Math.floor(Math.random() * ProxyList.length)];
}
The end result can be seen in the screenshot:
I have implemented proxy support in my forked version, available at: https://github.com/joone/headless-cluster.