puppeteer-cluster icon indicating copy to clipboard operation
puppeteer-cluster copied to clipboard

Proxies

Open kc1nn4y opened this issue 4 years ago • 10 comments

Is it possible to use different proxies per browser instance? I want to create something so that every instance has a different proxy through which the browser will retrieve information.

kc1nn4y avatar Jan 23 '21 19:01 kc1nn4y

you can pass in browser configs via puppeteerOptions.

    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_PAGE,
        maxConcurrency: 2,
        puppeteerOptions: {
            headless: false,
            devTools: true,
            ignoreHTTPSErrors: true,
            timeout: 0,
            args:  [
                    '--no-sandbox',
                    '--disable-setuid-sandbox', 
                  '--window-size=1920,1080',
               '--proxy-server=http://localhost:8888'
            ],
            ignoreDefaultArgs: ['--enable-automation']
        }
    });

ejames17 avatar Feb 16 '21 21:02 ejames17

+1, I also would like to use different proxy to each browser instance. @Yannicko have you found a solution?

amunim avatar Feb 26 '21 07:02 amunim

I found the solution literally a few searches later after I posted the comment.

Anyway anyone else looking for a solution use this: proxy-per-page

amunim avatar Mar 01 '21 08:03 amunim

Somehow it doesn't work for me :/

farruhsydykov avatar Oct 12 '21 17:10 farruhsydykov

I tried puppeteerOptions and perBrowserOptions individually and together at the same time and the proxy is completely ignored.

hatemjaber avatar Dec 30 '21 12:12 hatemjaber

I tried puppeteerOptions and perBrowserOptions individually and together at the same time and the proxy is completely ignored.

I experience the same behavior. Have you found a solution to this yet?

Edit: It's a bit late, but I found a solution to this problem if you are using a proxy-server. Please continue reading:

First of all I created a new Concurrency by copying the Browser-Concurrency and renamed it to BrowserProxy. Then I changed the code in the workerInstance to check if the options contain the --proxy-server argument like this:

class BrowserProxy extends ConcurrencyImplementation_1.default {

  ...

  let page;
  let context; // puppeteer typings are old...
  const proxyServer = options.args.find(arg => arg.includes('--proxy-server=')).split('--proxy-server=')[1] || null;
  const contextOptions = {proxyServer: proxyServer ? proxyServer : null};
  return {
    jobInstance: () => __awaiter(this, void 0, void 0, function* () {

  ...

If so the proxy-server value will be saved and provided to the createIncognitoBrowserContext like this:

  ...

  jobInstance: () => __awaiter(this, void 0, void 0, function* () {
                      yield util_1.timeoutExecute(BROWSER_TIMEOUT, (() => __awaiter(this, void 0, void 0, function* () {
                          context = yield chrome.createIncognitoBrowserContext(contextOptions);
                          page = yield context.newPage();
                      }))());
                      return {
  ...

After that make changes to all the Concurrency files so your Concurrency can be used by puppeteer-cluster like this:

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSERPROXY,
    maxConcurrency: 1,
    timeout: properties.taskTimeout, 
    puppeteerOptions: {
        headless: false,
        ignoreHTTPSErrors: true,
        args: [
          `--proxy-server=${proxy_server}`,
          '--no-sandbox',
        ]
    },
    puppeteer: puppeteer,
    monitor: false,
    retryLimit: 3,
    retryDelay: 3500
});

There is probably a better way to handle that, but this was my first approach in fixing this issue. Let me know if that helped you in any way.

code-ric avatar Jan 14 '22 20:01 code-ric

@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:

I created a proxyServer variable with the proxy server for this instance: const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;

and i changed context to: context = await chrome.createIncognitoBrowserContext({ proxyServer });

hatemjaber avatar Jan 17 '22 14:01 hatemjaber

@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:

I created a proxyServer variable with the proxy server for this instance: const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;

and i changed context to: context = await chrome.createIncognitoBrowserContext({ proxyServer });

That's another way to do it. Good you found it too 👍

code-ric avatar Jan 17 '22 15:01 code-ric

Hi all,

I have found a solution for those who may have been struggling with a lack of proxy support per request or per browser in puppeteer-cluster. I was able to achieve this by utilising the proxy-per-page package.

I hope this solution helps others in a similar situation. Please see the example code below for implementation details.

proxies.json

[
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx"
]

index.ts

import { Cluster } from 'puppeteer-cluster';
import ProxyList from '../proxies.json';
import useProxy from 'puppeteer-page-proxy';

(async () => {

    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
        monitor: true,        
        puppeteerOptions: {
            headless: false
        }
    });

    await cluster.task(async ({page, data: url}) => {

        await useProxy(page, `direct://${getProxy()}`);

        await page.goto(url);
    });

    cluster.queue('https://ipinfo.io');
    cluster.queue('https://ipinfo.io');

    await cluster.idle();
    await cluster.close();
})();

function getProxy() {
    return ProxyList[Math.floor(Math.random() * ProxyList.length)];
}

The end result can be seen in the screenshot: image

RestfuI avatar Jan 18 '23 23:01 RestfuI

I have implemented proxy support in my forked version, available at: https://github.com/joone/headless-cluster.

joone avatar Mar 15 '24 21:03 joone