Limnoria icon indicating copy to clipboard operation
Limnoria copied to clipboard

utils.web.getEncoding() always returning 'None' in Web plugin

Open Rodrigo-NH opened this issue 5 years ago • 0 comments

Hi. While trying to find why NBSP (non-breaking space) decodes incorrectly if page is charset iso8859-1 I discovered that in the Web plugin, actual line 155 "text = text.decode(utils.web.getEncoding(text) or 'utf8', 'replace')" the utils.web.getEncoding(text) is always returning 'None'. I tried a couple of different pages with same result, getEnconding not being capable of returning actual encoding. Example of the problem: Title returned in the page 'https://www.freebsd.org/doc/handbook/usb-device-mode-terminals.html' the title contains nbsp in the right encoding accordingly iso8859-1. If I set decoding to iso8859-1 explicity in the code web plugin returns the title correctly.

I didn't look at getEnconding() yet to try finding the issue (in the case it's really a getEnconding() issue)

The current (running) version of this Limnoria is installed on 2019-01-24T22-12-03, running on Python 3.6.8 (default, Jan 3 2019, 01:10:23) [GCC 4.2.1 Compatible FreeBSD Clang 6.0.0 (tags/RELEASE_600/final 326565)]. The newest versions available online are 2019.02.22 (in master), 2019.02.22 (in testing).

Thanks!

Rodrigo-NH avatar Feb 24 '19 18:02 Rodrigo-NH