Proxy object has no attribute __dict__
Trying to make proxies work but always get this error:
File "/home/scraper/virtualenv/lib/python3.5/site-packages/GoogleScraper/scraping.py", line 372, in before_search
if not self.proxy_check(self.proxy):
File "/home/scraper/virtualenv/lib/python3.5/site-packages/GoogleScraper/http_mode.py", line 200, in proxy_check
status = 'Proxy check failed: {host}:{port} is not used while requesting'.format(**self.proxy.__dict__)
AttributeError: 'Proxy' object has no attribute '__dict__'
Proxies get added from TXT file to DB but after that it just keeps giving this error. I am trying to use it from python script not command line. Any ideas?
The only thing I can think off is that I am using postgresql instead of sqlite3, and I had to change protocol line in database(adding name) like this:
proto = Column(Enum('socks5', 'socks4', 'http', name='proto'))
But if that was the issue, I think it wouldn't add them to DB in the first place?
Picture of DB:
http://i.imgur.com/Tee3w19.png
EDIT I printed out "self.proxy" before AttributeError and it looks like this:
Proxy(proto='socks5', host='24.126.25.23', port='10361', username='', password='')
I tried changing the problematic line to:
status = 'Proxy check failed: %s:%s is not used while requesting' % (self.proxy.host, self.proxy.port)
And now it scrapes but for some reason uses server's OWN IP even though that's set to False. It also updates/changes proxy in DB(changes original proxies IP to it's own server IP) and leaves old proxies port.
http://i.imgur.com/PKdDSft.png
(setting server IP instead of proxies provided in file). Results are also for German google(proxies are US)
In console it even shows that it's using proxies correctly:
2015-12-09 10:04:04,902 - GoogleScraper.scraping - INFO - [+] HttpScrape[130.245.168.181:8080][search-type:normal][https://www.google.com/search?] using search engine "google". Num keywords=1, num pages for keyword=[1]
2015-12-09 10:04:04,903 - GoogleScraper.scraping - INFO - [+] HttpScrape[173.10.32.105:3128][search-type:normal][https://www.google.com/search?] using search engine "google". Num keywords=1, num pages for keyword=[1]
I narrowed the issue down but I can't figure out how to fix it. In http_mode.py:
def set_proxy(self):
Doesn't set the instance to proxy. As this in proxy_check:
text = self.requests.get(self.config.get('proxy_info_url')).text
Returns the IP of server. What am I missing ?
Additional info: Python version - 3.5.0+ Ubuntu 15.10 Selenium mode works.
Hi! Have you found out how to fix this?
The above code uses connectionpool.py part of urllib3 in requests package for python. The function used is: _new_conn. If there is fail to use the proxy (still checking why), then this new conn function code replaces the proxy ip with local pc ip: self.num_connections += 1 log.debug("Starting new HTTP connection (%d): %s", self.num_connections, self.host)
conn = self.ConnectionCls(host=self.host, port=self.port,
timeout=self.timeout.connect_timeout,
strict=self.strict, **self.conn_kw)
return conn
You should see in the log the line "Starting new HTTP connection". So after this IP replacement, the request goes via local PC and so google can find your localization and will redirect the request/query to local google search engine. If I will find why the user proxy set isn't targeted I will write it.