PyCrawler issues

docs: Fix a few typos

There are small typos in: - README.md Fixes: - Should read `whether` rather than `wether`. - Should read `approach` rather than `apprach`. Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

timgates42

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 58: invalid start byte

1

Sorry, I always got this problem. Is it my fault or a bug? It seems the exception only happens when there is Chinese.

chriszeng87

Removed anchor links from queue in ready_queue

John61590

Don't parse HTML with RegEx

2

That's just wrong. :warning: There are xml/html parsers like lxml or beautiful soup. See references: - http://stackoverflow.com/a/1732454/851737 - http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html (more technical

schlamar

Keywords

1

when the crawler craw until gives an UnicodeDecodeError the script crawled few sites however when i check database for the keywords, there is lots of empty entries and max 1...

ghost

Code out of box works for while then gives this error

1

using linux box sqlite [21:15:00] INFO::PyCrawler - Starting (http://www.dmoz.org)... [21:15:00] ERROR::PyCrawler - EXCEPTION: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128) Traceback (most recent call...

ghost

PyCrawler
PyCrawler copied to clipboard

Metadata

docs: Fix a few typos

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 58: invalid start byte

Removed anchor links from queue in ready_queue

Don't parse HTML with RegEx

Keywords

Code out of box works for while then gives this error

← Metadata

Owner

Metadata

PyCrawler PyCrawler copied to clipboard

Metadata

docs: Fix a few typos

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 58: invalid start byte

Removed anchor links from queue in ready_queue

Don't parse HTML with RegEx

Keywords

Code out of box works for while then gives this error

← Metadata

Owner

Metadata

PyCrawler
PyCrawler copied to clipboard