PyCrawler
PyCrawler copied to clipboard
A python web crawler
There are small typos in: - README.md Fixes: - Should read `whether` rather than `wether`. - Should read `approach` rather than `apprach`. Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
Sorry, I always got this problem. Is it my fault or a bug? It seems the exception only happens when there is Chinese.
That's just wrong. :warning: There are xml/html parsers like lxml or beautiful soup. See references: - http://stackoverflow.com/a/1732454/851737 - http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html (more technical
Keywords
when the crawler craw until gives an UnicodeDecodeError the script crawled few sites however when i check database for the keywords, there is lots of empty entries and max 1...
using linux box sqlite [21:15:00] INFO::PyCrawler - Starting (http://www.dmoz.org)... [21:15:00] ERROR::PyCrawler - EXCEPTION: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128) Traceback (most recent call...