python-email-crawler icon indicating copy to clipboard operation
python-email-crawler copied to clipboard

TypeError: expected string or buffer

Open ghost opened this issue 11 years ago • 13 comments

Sometimes it runs, sometimes it doesn't.

[14:22:38] INFO::email_crawler - Crawling http://www.google.com.au/search?q=electrician&start=0
[14:22:39] ERROR::email_crawler - Exception at url: http://www.google.com.au/search?q=electrician&start=0
HTTP Error 503: Service Unavailable
[14:22:39] ERROR::email_crawler - EXCEPTION: expected string or buffer 

ghost avatar May 18 '14 04:05 ghost

+1 Same here!

pmuens avatar Sep 05 '14 18:09 pmuens

+1 Same here. Could you please suggest a fix for this? Thank you

rkshakya avatar Mar 23 '15 16:03 rkshakya

+1 Same problem

DonatoNapoli avatar Nov 11 '15 12:11 DonatoNapoli

python email_crawler.py "intext:gmail filetype:csv"
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Keywords to Google for: intext:gmail filetype:csv
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=0
[10:14:14] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=10
...
[10:14:59] ERROR::email_crawler - Exception at url: http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=390
HTTP Error 503: Service Unavailable
[10:14:59] ERROR::email_crawler - EXCEPTION: expected string or buffer 
Traceback (most recent call last):
  File "email_crawler.py", line 212, in <module> 
    crawl(arg)
  File "email_crawler.py", line 65, in crawl
    for url in google_url_regex.findall(data):
TypeError: expected string or buffer

dcondrey avatar Jan 22 '16 18:01 dcondrey

same problem

hamdi-islam avatar Apr 25 '16 18:04 hamdi-islam

This issue should be resolved with this merge https://github.com/samwize/python-email-crawler/pull/7

dcondrey avatar May 03 '16 10:05 dcondrey

issue still not resolved, same here with the last version cloned from git on my linux

thomaslc66 avatar Jul 14 '16 21:07 thomaslc66

I still have a problem with "TypeError: expected string or buffer" . Can anyone help?

mrkkr avatar Feb 03 '17 11:02 mrkkr

Have the same issue as well

vizieral avatar Apr 09 '17 00:04 vizieral

Here is a solution to your problem;

  1. Open the file email_crawler.py (If you are using the terminal use nano email_crawler.py to edit the file)
  2. Go to the 24th line saying MAX_SEARCH_RESULTS = 500 and then change it to MAX_SEARCH_RESULTS = 100

Note that the reason behind this is that due to the fact that the scripts crawls 500 pages of google, the later treats the requests as spam and proceeds accordingly as if it's a spam-like script trying to scrape the internet using Google's search engine.

kevingatera avatar Jun 02 '17 22:06 kevingatera

I've got it too, and what @kevingatera didn't work the exact error I get is
It happens before it even gets the second page done so it's not the script being blocked

:~/python-email-crawler$ python email_crawler.py "ios developers" [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Keywords to Google for: ios developers [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Crawling http://www.google.com/search?q=ios+developers&start=0 [19:05:06] ERROR::email_crawler - Exception at url: http://www.google.com/searchq=ios+developers&start=0 HTTP Error 503: Service Unavailable [19:05:06] ERROR::email_crawler - EXCEPTION: expected string or buffer traceback (most recent calll ast): File "email_crawler.py", line 212, in <module> crawl(arg) File "email_crawler.py", line 65, in crawl for url in google_url_regex.findall(data) typeError: expected string or buffer

charlieporth1 avatar Feb 21 '18 19:02 charlieporth1

@charlieporth1 What's happening is that Google blocks your IP almost as soon as they get your request. Using another computer/IP will work.

kevingatera avatar Feb 22 '18 00:02 kevingatera

@kevingatera turns out I was using torify and that didn't help. You should include IP rotation similar to whats in here here I would help you if I knew more about python

charlieporth1 avatar Feb 25 '18 01:02 charlieporth1