python-email-crawler
python-email-crawler copied to clipboard
Search on Google, and crawls for emails related to the result
Sometimes it runs, sometimes it doesn't. ``` [14:22:38] INFO::email_crawler - Crawling http://www.google.com.au/search?q=electrician&start=0 [14:22:39] ERROR::email_crawler - Exception at url: http://www.google.com.au/search?q=electrician&start=0 HTTP Error 503: Service Unavailable [14:22:39] ERROR::email_crawler - EXCEPTION: expected string...
Is there any way to speed the python program up by adding multithreading?
I'm see sorce my need to Improve and make some sql injection function or and other injection.My be it will good def test(self): c = CrawlerDb() c.connect() # c.enqueue(['a12222', '11'])...
Example: ``` ogo_website_200x60_1fe006a7-6795-4ef6-ab21-77ce25ef0772_160x@2x.png ``` Possible solution may add allow list of TLDs
Traceback (most recent call last): File "email_crawler.py", line 12, in logging.config.dictConfig(LOGGING) File "C:\Python27\lib\logging\config.py", line 794, in dictConfig dictConfigClass(config).configure() File "C:\Python27\lib\logging\config.py", line 576, in configure '%r: %s' % (name, e)) ValueError:...
[00:19:27] ERROR::email_crawler - Exception at url: http://www.google.com/search?q=something&start=0 HTTP Error 503: Service Unavailable [00:19:27] ERROR::email_crawler - EXCEPTION: expected string or buffer
I've example URL: `http://www.website.com/Search/in/Alderley Edge` and it failed to fetch the page even though it exists. I've got this as response: `[22:57:43] ERROR::email_crawler - Exception at url: http://www.website.com/Search/in/Alderley Edge HTTP...
I ran this program with several commands, and it not only didn't save anything: ``` dave@dave-HP-EliteBook-8560w:~/Code/Downloaded/python-email-crawler-master$ python email_crawler.py "iphone developers" [01:53:42] INFO::email_crawler - ---------------------------------------- [01:53:42] INFO::email_crawler - Keywords to Google...
After crawling, always the same **0 emails** message : [11:41:50] INFO::email_crawler - ======================================== [11:41:50] INFO::email_crawler - Processing... [11:41:50] INFO::email_crawler - There are 0 emails [11:41:50] INFO::email_crawler - All emails saved...
Currently find_links_in_html_with_same_hostname goes to different hostnames as well.