scrapy-proxies icon indicating copy to clipboard operation
scrapy-proxies copied to clipboard

How to check that a proxy is really being used?

Open ravillarreal opened this issue 6 years ago • 3 comments

In the process_request function the proxy is passed to the request only if has an proxy_user_pass, otherwise only print that the proxy is beign used and which are left. That means that a proxy like https://176.37.14.252:8080 does not work?

This is the function:

def process_request(self, request, spider):
     # Don't overwrite with a random one (server-side state for IP)
     if 'proxy' in request.meta:
         if request.meta["exception"] is False:
             return
     request.meta["exception"] = False
     if len(self.proxies) == 0:
         raise ValueError('All proxies are unusable, cannot proceed')

     if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS:
         proxy_address = random.choice(list(self.proxies.keys()))
     else:
         proxy_address = self.chosen_proxy

     proxy_user_pass = self.proxies[proxy_address]

     if proxy_user_pass:
         request.meta['proxy'] = proxy_address
         basic_auth = 'Basic ' + base64.b64encode(proxy_user_pass.encode()).decode()
         request.headers['Proxy-Authorization'] = basic_auth
     else:
         log.debug('Proxy user pass not found')
     log.debug('Using proxy <%s>, %d proxies left' % (
             proxy_address, len(self.proxies)))

ravillarreal avatar Aug 16 '18 17:08 ravillarreal

I made a test with this middleware : without proxy_user_pass (I don't have one to test with), proxy is not used :

import scrapy

class MyipSpider(scrapy.Spider):
    name = 'myip'
    start_urls = ['http://www.mon-ip.com]

    def parse(self, response):
        for in in response.xpath('//*[@id="PageG"]'):
            yield {
                'ip': ip.xpath('p[3]/span[2]//text()').extract_first(),
            }

gives : 2018-08-28 15:17:10 [scrapy.proxies] DEBUG : Using proxy <https://pro.xy.add.ress:port>, x proxies left [...] 2018-08-28 15:17:10 [scrapy.core.scraper] DEBUG : Scraped from <200 http://www.mon-ip.com> {'ip': 'my.ip.add.ress'}

schiz0phr3ne avatar Aug 25 '18 13:08 schiz0phr3ne

bump on schizophrene's PR. I was able to use that change and verify that my requests were indeed using a proxy's IP and not my own local IP.

BriungRi avatar Nov 09 '19 18:11 BriungRi