cuetools
cuetools copied to clipboard
Tag problem and bad character errors when dealing with Japanese cue file
Version: cuetools 1.4.1-4 from Arch's repo
Similar issue with flacon. https://github.com/flacon/flacon/issues/176
I tried to use cuetag.sh problematic.cue *.flac
to tag my split .flac files.
But the files are tagged with #### (number sign, hash, pound sign). For all fields: title, artists, etc.
The tags are originally in Japanese. And the .flac files were split using shnsplit
.
I haven't tried reproducing the problem with other Asian languages like Chinese or Korean.
I've also tried cueprint problematic.cue
. It prints the track information correctly with Japanese titles and so on. But has this error in the beginning:
bad character '�'
bad character '�'
bad character '�'
bad character 'R'
bad character 'E'
bad character 'M'
My default locale is en_US.UTF-8
.
List of enabled locale (locale -a
):
C
en_US.utf8
ja_JP.utf8
POSIX
zh_CN.utf8
zh_TW.utf8
The problematic .cue file: problematic.cue.zip
I don't see anyone else noticed the problem. So could be just on my end.
edit: Removing BOM only fixes the bad character error. But not the tag problem.
Someone figured out that it's an encoding issue with flac library in the case of flacon. Could be just on Arch. https://github.com/flacon/flacon/issues/176#issuecomment-1078955451
edit: He also suggested workarounds that work for flacon. Probably can work for cuetools too.
I've met with the same error. In my case, it was caused by BOM in the beginning of the CUE file (so the parser fails at three unrecognizable character, followed by the first REM directive).
Stripping the BOM fixed the problem for me.
Thanks, @CircuitCoder.
I checked that the cue I posted has byte order mark (BOM) with file
. And removed it with
sed '1s/^\xEF\xBB\xBF//' < problematic.cue > new.cue
as per https://unix.stackexchange.com/questions/381230/how-can-i-remove-the-bom-from-a-utf-8-file/381263#381263
Afterwards cueprint
doesn't get the bad character error anymore. I'll give it a try if cuetag.sh
work this weekend.
Ok, didn't work for the tag issue.
Basically I split a .wav file into flac this way:
shnsplit -f problematic.cue -o flac -t "%n %t" original.wav
This produces the flacs with the correct Japanese title.
Then I tag them with:
cuetag.sh problematic.cue *.flac
This produces flacs with #### (number sign, hash, pound sign) in their tags.
I have removed the BOM in the problematic.cue as suggested. And cueprint
didn't say any bad character error.
Maybe metaflac needs --no-utf8-convert
Adding that option, together with importing tags from file, worked for me when tags contain cjk characters with metaflac.
When using curtag.sh for batch tagging, I had to edit cuetag.sh and add that option to METAFLAC.
This behavior of metaflac actually seems really odd to me. I suppose one would not need to perform utf8 conversion on any locale already using utf-8. But I guess this is how metaflac works.
Sounds related to the sleuthing done on the flacon thread. https://github.com/flacon/flacon/issues/176#issuecomment-1078955451
It's probably a bug with UTF-8 conversion of the flac library. But dunno.
Yes. It seems like when building using CMake (which is what ArchLinux is doing), flac never uses LANGINFO (Compare https://github.com/xiph/flac/blob/master/CMakeLists.txt with https://github.com/xiph/flac/blob/master/build/config.mk#L157)
I'm going to open an issue there. Thanks for your pointers!
I just manually added -DHAVE_LANGINFO_CODESET into flac's CMakeLists.txt. This is the test result (the newly built version sits in /usr/local)
➜ tmp metaflac test.flac --set-tag=ARTIST=喵
➜ tmp metaflac test.flac --set-tag=ARTIST=喵 --no-utf8-convert
➜ tmp CHARSET=UTF-8 metaflac test.flac --set-tag=ARTIST=喵
➜ tmp /usr/local/bin/metaflac test.flac --set-tag=ARTIST=喵
➜ tmp metaflac test.flac --export-tags-to=- --no-utf8-convert
ARTIST=###
ARTIST=喵
ARTIST=喵
ARTIST=喵
This is the expected result, because when not compiled with langinfo, flac tries to read the environment variable CHARSET for charset. When compiled with langinfo, flac can actually read the locale, and solves the issue.
Thanks a lot for the explanation, @CircuitCoder.