PyCrawler icon indicating copy to clipboard operation
PyCrawler copied to clipboard

A python web crawler

Results 6 PyCrawler issues
Sort by recently updated
recently updated
newest added

There are small typos in: - README.md Fixes: - Should read `whether` rather than `wether`. - Should read `approach` rather than `apprach`. Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

Sorry, I always got this problem. Is it my fault or a bug? It seems the exception only happens when there is Chinese.

That's just wrong. :warning: There are xml/html parsers like lxml or beautiful soup. See references: - http://stackoverflow.com/a/1732454/851737 - http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html (more technical

when the crawler craw until gives an UnicodeDecodeError the script crawled few sites however when i check database for the keywords, there is lots of empty entries and max 1...

using linux box sqlite [21:15:00] INFO::PyCrawler - Starting (http://www.dmoz.org)... [21:15:00] ERROR::PyCrawler - EXCEPTION: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128) Traceback (most recent call...