scrapy-proxies
scrapy-proxies copied to clipboard
How to check that a proxy is really being used?
In the process_request
function the proxy is passed to the request only if has an proxy_user_pass
, otherwise only print that the proxy is beign used and which are left. That means that a proxy like https://176.37.14.252:8080
does not work?
This is the function:
def process_request(self, request, spider):
# Don't overwrite with a random one (server-side state for IP)
if 'proxy' in request.meta:
if request.meta["exception"] is False:
return
request.meta["exception"] = False
if len(self.proxies) == 0:
raise ValueError('All proxies are unusable, cannot proceed')
if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS:
proxy_address = random.choice(list(self.proxies.keys()))
else:
proxy_address = self.chosen_proxy
proxy_user_pass = self.proxies[proxy_address]
if proxy_user_pass:
request.meta['proxy'] = proxy_address
basic_auth = 'Basic ' + base64.b64encode(proxy_user_pass.encode()).decode()
request.headers['Proxy-Authorization'] = basic_auth
else:
log.debug('Proxy user pass not found')
log.debug('Using proxy <%s>, %d proxies left' % (
proxy_address, len(self.proxies)))
I made a test with this middleware : without proxy_user_pass
(I don't have one to test with), proxy is not used :
import scrapy
class MyipSpider(scrapy.Spider):
name = 'myip'
start_urls = ['http://www.mon-ip.com]
def parse(self, response):
for in in response.xpath('//*[@id="PageG"]'):
yield {
'ip': ip.xpath('p[3]/span[2]//text()').extract_first(),
}
gives :
2018-08-28 15:17:10 [scrapy.proxies] DEBUG : Using proxy <https://pro.xy.add.ress:port>, x proxies left [...] 2018-08-28 15:17:10 [scrapy.core.scraper] DEBUG : Scraped from <200 http://www.mon-ip.com> {'ip': 'my.ip.add.ress'}
This change works : https://github.com/aivarsk/scrapy-proxies/pull/43/files
bump on schizophrene's PR. I was able to use that change and verify that my requests were indeed using a proxy's IP and not my own local IP.