mutagen
mutagen copied to clipboard
mid3v2 crashes with "UnicodeEncodeError: surrogates not allowed" on files with accented characters in the filename
Trying to see whether ISRC tags are present in a large audio collection using
mid3v2 -l 00*/*3 | grep -a TSRC
it dies halfway through, saying
IDv2 tag info for 00-225167/mina - volami nel cuore.mp3
TIT2=Volami nel cuore
TPE1=MINA
TRCK=1
IDv2 tag info for Traceback (most recent call last):
File "/usr/bin/mid3v2", line 33, in <module>
sys.exit(load_entry_point('mutagen==1.46.0', 'console_scripts', 'mid3v2')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 484, in entry_point
return main(sys.argv)
^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 469, in main
list_tags(args)
File "/usr/lib/python3/dist-packages/mutagen/_tools/mid3v2.py", line 335, in list_tags
print("IDv2 tag info for", filename)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in position 13: surrogates not allowed
This isn't Mina's fault; it's the following file's name which is ANSI or CP437 encoded: "modà - la notte.mp3" where à is represented by character 0x85. The same goes for other files whose names contain 0x8A for è, 0xB4 for é, 0x95 for ò, 0x97 for ù, 0xA2 for ó and so on.
On Debian GNU/Linux with LANG=en_GB.UTF-8