Sick-Beard icon indicating copy to clipboard operation
Sick-Beard copied to clipboard

Use chardet to make an intelligent encoding guess

Open depassp opened this issue 11 years ago • 2 comments

We still use sickbeard.SYS_ENCODING, but if that fails, try chardet before dumping out completely.

This fixes bug 2253: https://code.google.com/p/sickbeard/issues/detail?id=2253

depassp avatar Apr 13 '13 05:04 depassp

we already handle this with encoding-kludge. most likely there is a log entry or function thats not using it.. we just need the traceback to fix the issue. no need to add another lib ontop of this

thezoggy avatar Apr 14 '13 10:04 thezoggy

sickbeard.helpers.encodingKludge is the only function that writes "Unable to decode value:" to the log

This error is showing up in the bug reporter's log.

Therefore the offending function is using fixStupidEncodings (or ek).

This error means whatever bytestring we attempted to decode was not encoded with the user's sickbeard.SYS_ENCODING (which comes from locale.getpreferredencoding, and is hopefully 'utf-8').

In this particular case, the bytestring 'House - 3x06 - Que Ser\xe1 Ser\xe1.avi' appears to be encoded in cp1252 (and not utf-8). cp1252 is the default encoding for Windows US English.

This raises the more general question: What should we do about a bytestring with an unknown encoding?

My solution: Take a guess. Chardet does this well.

Perhaps it might be better to investigate why/how a file can get downloaded/saved whose filename is not encoded with the user's locale.

depassp avatar Apr 14 '13 17:04 depassp