python-goose HtmlFetcher does not handle gzip compression

HtmlFetcher does not handle gzip compression

Open kqr opened this issue 9 years ago • 2 comments

Some servers force gzip compression on their content, which HtmlFetcher does not deal gracefully with because urllib2 assumes non-compressed content. Cheapest/easiest solution would be to check the encoding header on the response and decompress with zlib if it's gzipped. More ambitious/heavy solution would be to move over to something like requests rather than urllib2.

Aug 11 '15 10:08 kqr

Requests: 72929331d44309f9002ae0dd3cd268cfddb0e43f

Jan 13 '16 09:01 Lol4t0

Awesome! Let's hope it gets merged.

Jan 13 '16 10:01 kqr

python-goose python-goose copied to clipboard

HtmlFetcher does not handle gzip compression

python-goose
python-goose copied to clipboard