VideoSort problems with "special" characters

With every download which contains characters like é, ä or ß this script fails and I get error messages like this:

VideoSort: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)

Is there any way I can correctly handle this kind of characters?

Thanks kalle

Sep 01 '15 06:09 kalle-del-haye

Are you sure you use the latest version?

Sep 01 '15 07:09 hugbug

The About VideoSort text says:

PP-Script Version: 6.1.

Downloaded from GitHub one or two weeks ago.

Sep 01 '15 12:09 kalle-del-haye

Please post the full log output, it should have more exception info. What OS it runs on? Please send me an example nzb-file to [email protected].

Sep 02 '15 09:09 hugbug

The OS is Debian 7.8 with backport-kernel 3.16 and Python 2.7.3.

error   Sun Aug 30 2015 16:11:01   Post-process-script videosort/VideoSort.py for xxxxxxx failed
error   Sun Aug 30 2015 16:11:01   VideoSort: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
error   Sun Aug 30 2015 16:11:01   VideoSort: Failed: roor-vsdb-1080p-subs.rar
info    Sun Aug 30 2015 16:11:01   VideoSort: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
info    Sun Aug 30 2015 16:11:01   VideoSort: matcher = guessit.matcher.IterativeMatcher(unicode(guessfilename), filetype='autodetect', options={'nolanguage': True, 'nocountry': True})
info    Sun Aug 30 2015 16:11:01   VideoSort: File "/usr/share/nzbget/scripts/videosort/VideoSort.py", line 980, in guess_info
info    Sun Aug 30 2015 16:11:01   VideoSort: guess = guess_info(filename)
info    Sun Aug 30 2015 16:11:01   VideoSort: File "/usr/share/nzbget/scripts/videosort/VideoSort.py", line 1065, in construct_path
info    Sun Aug 30 2015 16:11:01   VideoSort: new_path = construct_path(old_path)
info    Sun Aug 30 2015 16:11:01   VideoSort: File "/usr/share/nzbget/scripts/videosort/VideoSort.py", line 1180, in <module>
info    Sun Aug 30 2015 16:11:01   VideoSort: Traceback (most recent call last):
info    Sun Aug 30 2015 16:11:01   Executing post-process-script videosort/VideoSort.py for xxxxxxx

The nzb-file is on the way.

Sep 02 '15 17:09 kalle-del-haye

The nzb-file is on the way.

I never got the email. Please send again.

Sep 10 '15 08:09 hugbug

Send again, using a different account and zipped the nzb file.

Sep 10 '15 08:09 kalle-del-haye

I have the same error with this nzb: "Post-process-script videosort/VideoSort.py for Zoomania - Ganz schön ausgefuchst (2016) failed"

"VideoSort: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)"

"VideoSort: Failed: Zoomania - 1080p - AC3.mkv INFO Mon Jun 06 2016 12:54:18 VideoSort: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128) INFO Mon Jun 06 2016 12:54:18 VideoSort: matcher = guessit.matcher.IterativeMatcher(unicode(guessfilename), filetype='autodetect', options={'nolanguage': True, 'nocountry': True}) INFO Mon Jun 06 2016 12:54:18 VideoSort: File "/storage/nzbget/scripts/videosort/VideoSort.py", line 980, in guess_info INFO Mon Jun 06 2016 12:54:18 VideoSort: guess = guess_info(filename) INFO Mon Jun 06 2016 12:54:18 VideoSort: File "/storage/nzbget/scripts/videosort/VideoSort.py", line 1065, in construct_path"

Can you say whats the problem with this file? Zoomania - Ganz schön ausgefuchst (2016) {{XDx3rX6JGRvRgnJpRW}}.zip

Jun 06 '16 11:06 Sirvival21

The root problem is that in modern OSes, filenames can typically have unicode characters, pretty much universally byte-encoded as UTF-8, but VideoSort tries to decode with a simple unicode(guessfilename) call. As no encoding is specified, Python 2 defaults to assuming a much more limited ASCII encoding. The change that's needed is making this unicode(guessfilename, encoding='utf-8'). This should be perfectly safe and backward-compatible because UTF-8 is a superset of ASCII.

Specifically, this line in VideoSort.py:

matcher = guessit.matcher.IterativeMatcher(unicode(guessfilename), filetype='autodetect', options={'nolanguage': True, 'nocountry': True})

needs to be replaced with this (or the equivalent):

guessfilename = unicode(guessfilename, encoding='utf-8')
matcher = guessit.matcher.IterativeMatcher(guessfilename, filetype='autodetect', options={'nolanguage': True, 'nocountry': True})

Nov 01 '16 05:11 mikenerone

@mikenerone: Will it work on Windows too (as Windows doesn't use UTF-8 for file names)?

Jun 22 '17 19:06 hugbug