rgain3
rgain3 copied to clipboard
Fails on filenames that use a character encoding different from the system
I have a friend that has a audio collection that predates the general availability of UTF-8 on OSs. He also has a lot of music with band, album and son names that include non ascii chars. Combine those two and you get:
Traceback (most recent call last):
File "/usr/bin/collectiongain", line 6, in <module>
collectiongain()
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 341, in collectiongain
do_collectiongain(args[0], opts.ref_level, opts.force, opts.dry_run,
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 274, in do_collectiongain
collect_files(music_dir, files, visited_cache,
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 117, in collect_files
print(" [%i] %s |" % (i, filepath), end='')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udced' in position 49: surrogates not allowed
Notice that these are valid filenames (from the OS point of view; on Unix, any char except \0x00
and /
can be part of the path), just not valid UTF-8. Yes, he could sit down and rename all those files and directories, but I guess he won't be the only one.
OTOH, you could say 'go fix your filenames' and we will understand. Cheers!
Thanks for reporting.
Non UTF-8 file names are definitely something the script should be able to deal with. You're probably right, that your friend won't be the only one.
This problem should be solvable by making use of PEP 383.
This regression has probably been introduced with 6de774076d76ded856c03968495b90001d293035
@StyXman could you try a Python3 compatible version prior to this commit?
git clone https://github.com/chaudum/rgain.git
cd rgain
git checkout aef5bde971c204d46e11a5f808aa4152cefa9687
python3 -m venv env
env/bin/python -m pip install -Ue .
@StyXman Unfortunately I could not reproduce your issue yet. I tried to create files with random bytes as filenames, but did not succeed either - ran into a different issue:
$ python
Python 3.8.6 (default, Sep 25 2020, 09:36:53)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir()
['album-tag.mp3']
>>> os.rename('album-tag.mp3', os.urandom(4)+b'.mp3')
>>> os.listdir()
['\udcdb\udcc3\udcc0L.mp3']
$ env/bin/collectiongain /tmp/tmp.iEg1y395Tw
Collecting files ...
[1] ���L.mp3 |Test Album
Dispatching jobs ...
Now waiting for results ...
Unfortunately, there were some errors:
Test Album:Checking for Replay Gain information ...
/tmp/tmp.iEg1y395Tw/���L.mp3:none
Calculating Replay Gain information ...
Traceback (most recent call last):
File "/home/christian/sandbox/chaudum/rgain/rgain3/replaygain.py", line 112, in do_gain
tracks_data, albumdata = calculate_gain(files, ref_level)
File "/home/christian/sandbox/chaudum/rgain/rgain3/replaygain.py", line 53, in calculate_gain
rg.start()
File "/home/christian/sandbox/chaudum/rgain/rgain3/lib/rgcalc.py", line 93, in start
if not self._next_file():
File "/home/christian/sandbox/chaudum/rgain/rgain3/lib/rgcalc.py", line 184, in _next_file
self.src.set_property("location", fname)
TypeError: could not convert '/tmp/tmp.iEg1y395Tw/\udcdb\udcc3\udcc0L.mp3' to type 'gchararray' when setting property 'GstFileSrc.location'
0 successful, 1 failed.
All finished.
Could you provide information about your Python version and encoding?
python --version
python -c "import sys; print(sys.getfilesystemencoding(), sys.getdefaultencoding())"
locale
Could you provide information about your Python version and encoding?
@StyXman :arrow_up:
Sorry, busy with life :(
mdione@diablo:~$ python3
Python 3.9.1+ (default, Jan 10 2021, 15:42:50)
[GCC 10.2.1 20201224] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ
environ({'LANGUAGE': 'en_US:es:fr:it', 'LANG': 'en_US.UTF-8', 'LC_TIME': 'es_AR.UTF-8'})
I was pretty sure at least LC_ALL
would be en_US.UTF-8
. I guess LANG
is picked up instead?
Ah:
mdione@diablo:~$ python3 -c "import sys; print(sys.getfilesystemencoding(), sys.getdefaultencoding())"
utf-8 utf-8
mdione@diablo:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:es:fr:it
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME=es_AR.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Thanks, will have another try whether I can reproduce the issue on my machine.
I am also having this problem. My OS is Ubuntu 22.04.4. I installed rgain via apt install replaygain
My failing output:
Collecting files ...
Traceback (most recent call last):
File "/usr/bin/collectiongain", line 6, in <module>
collectiongain()
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 341, in collectiongain
do_collectiongain(args[0], opts.ref_level, opts.force, opts.dry_run,
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 274, in do_collectiongain
collect_files(music_dir, files, visited_cache,
File "/usr/lib/python3/dist-packages/rgain3/script/collectiongain.py", line 117, in collect_files
print(" [%i] %s |" % (i, filepath), end='')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcea' in position 53: surrogates not allowed
python3 -version:
Python 3.10.12
python3 -c "import sys; print(sys.getfilesystemencoding(), sys.getdefaultencoding())":
utf-8 utf-8
locale:
LANG=en_CA.UTF-8
LANGUAGE=en_CA:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
I am happy to report any other information that can help diagnose this problem.
Thanks for reporting.
Non UTF-8 file names are definitely something the script should be able to deal with. You're probably right, that your friend won't be the only one.
This problem should be solvable by making use of PEP 383.
How can I try to use PEP 383 to try to solve this issue?