flac2mp3
flac2mp3 copied to clipboard
Multibyte tags corrupts ID3v2.3 tags in mp3
If the input flac contains strange multibyte characters it results in the output mp3 to have bad id3v2.3 tags.
Some tag parsers do not care about this. mediainfo
shows the full tags. but ffprobe
won't it stops outputting tags after the first bad tag.
How to reproduce?
Make an empty directory for the test input
mkdir input-dir
cd input-dir
Find a random flac, eg.
curl -O https://filesamples.com/samples/audio/flac/sample1.flac
Add the problematic tag
metaflac sample1.flac --set-tag='TITLE=It’s Broken'
# add some other tags to show the effect of the bug
metaflac sample1.flac --set-tag='ARTIST=Some Artist'
metaflac sample1.flac --set-tag='ALBUM=Some Album'
cd ..
Convert like normal
flac2mp3.pl input-dir output-dir
Check the tags in ffprobe: ffprobe -v quiet -show_format -print_format ini output-dir/sample1.mp3
# ffprobe output
[format]
filename=output-dir/sample1.mp3
nb_streams=1
nb_programs=0
format_name=mp3
format_long_name=MP2/3 (MPEG audio layer 2/3)
start_time=0.025057
duration=122.122449
size=2489344
bit_rate=163071
probe_score=51
[format.tags]
album=Some Album
title=It’s Broke
title
is missing its last character and artist
is completely missing. The problem seems to be the ’
character in the title, which is 3 bytes long.
Tags not gone, however: mediainfo -f --Details=1 output-lol/sample1.mp3 | grep TIT2 --after=10
00001F TIT2 - Title/songname/content description (36 bytes)
00001F Header (10 bytes)
00001F Frame ID: TIT2
000023 Size: 25 (0x00000019)
000027 Flags: 0 (0x0000)
000029 Tag alter preservation: 0 (0x0000000000000000)
000029 File alter preservation: 0 (0x0000000000000000)
000029 Read only: 0 (0x0000000000000000)
000029 Compression: 0 (0x0000000000000000)
000029 Encryption: 0 (0x0000000000000000)
000029 Grouping identity: 0 (0x0000000000000000)
000029 Text_encoding: 1 (0x01)
00002A Information: It’s Broken
Here Size=25 seems to be off by 1. This seems to result in some media players showing the correct tags (they don't care about the Size header) for example VLC. Others will instead just render the tags until the faulty frame, for example deadbeef.
After heaps of debugging.
It should be off by 1. At least in some implementations. But not by others. The added byte comes from unsynchronizing, ie removing byte patterns that looks like audio synchronization frames by adding an extra \x00
to it.
It turns out, however, that the spec is not clear on whether tags should be un-unsynced (ie removing that added \x00 byte) before or after parsing the id3 tags.
libid3tag (eg. libmad) for example assumes (just like MP3/Tag) that the idv3 tags are "un-unsynced" before parsing them. Whereas libmpg123 (eg. ffmpeg) appear to assume that they should be parsed "as-is".
Edit:
It seems that the few mp3s with special characters that I checked (russian, korean, chinese) don't set the unsynchronisation flag. So my proposed solution would be disable id3v2_unsync. flac2all
(which sets tags using with lame) for example does not enable unsync as it seems.