[Bug] Non-Latin1 characters in metadata are not always displayed correctly
Bug Summary
When having files containing non-latin1 characters in tags, some of them (interestingly not all!) are not displayed correctly. See the "Screenshots" section below for an example.
Version
1.7.1 (from F-Droid)
Steps to reproduce the bug
Have some files containing non-latin1 characters in their metadata :-)
Screenshots or Screen recordings
Example: I have an album from "Sašo Avsenik und seine Oberkrainer". When I search for "Avsenik" in "Artists", I get two distinct matches, one with the real "š" character, and one with the unicode character displayed as a wrong encoding:
That one album is consequently split up, in:
and
When you open an UTF-8 text file containing "Sašo" using ISO-8859-1, you get "SaÅ¡o", so I think you use ISO-8859-1 here instead of UTF-8.
Other helpful information
All files are Opus files tagged via MusicBrainz. So the metadata is consistent across all. Here's the opusinfo output for two of the "different artist" files:
$ opusinfo "02. Wann kommst du zu mir.opus"
Processing file "02. Wann kommst du zu mir.opus"...
New logical stream (#1, serial: 29e0da3a): type opus
Encoded with libopus 1.4, libopusenc 0.2.1
User comments section follows...
...
ALBUM=Polkaklang ein Leben lang!
ALBUMARTIST=Sašo Avsenik und seine Oberkrainer
ALBUMARTISTSORT=Avsenik, Sašo und seine Oberkrainer
ARTIST=Sašo Avsenik und seine Oberkrainer
ARTISTS=Sašo Avsenik und seine Oberkrainer
ARTISTSORT=Avsenik, Sašo und seine Oberkrainer
...
TITLE=Wann kommst du zu mir
...
and
$ opusinfo "07. Wenn Mädchen träumen.opus"
Processing file "07. Wenn Mädchen träumen.opus"...
New logical stream (#1, serial: 27b1e98e): type opus
Encoded with libopus 1.4, libopusenc 0.2.1
User comments section follows...
...
ALBUM=Polkaklang ein Leben lang!
ALBUMARTIST=Sašo Avsenik und seine Oberkrainer
ALBUMARTISTSORT=Avsenik, Sašo und seine Oberkrainer
ARTIST=Sašo Avsenik und seine Oberkrainer
ARTISTS=Sašo Avsenik und seine Oberkrainer
ARTISTSORT=Avsenik, Sašo und seine Oberkrainer
...
TITLE=Wenn Mädchen träumen
...
My wild guess would be that it's not assumed that the vorbiscomment content is UTF-8 encoded (the tag values are always UTF-8, cf. the Opus documentation and/or the Ogg Vorbis documentation).
It seems like the program tries to guess the encoding, based on the song title: "Wann kommst du zu mir" is pure ASCII. Here, the "š" in the artist's name is letter salad. In contrast, "Wenn Mädchen träumen" contains the non-ASCII "ä" character, and here, we get the "š" correctly. All songs for the correctly parsed artist version contain non-ASCII characters, whereas all with the wrong encoding artist are pure ASCII.
Just guesswork though ;-)
This seems to be an upstream issue with the underlying Android provided metadata extraction. I saw the very same problem using Phocid and also reported it there: https://github.com/TJYSunset/Phocid/issues/29#issuecomment-2555793444 .
A dev replied and advised me to enable "Advanced metadata extraction", which bypasses Android's functionality and used the player's own – and this fixed the problem.
Just fyi, I reported the issue upstream: https://issuetracker.google.com/issues/385155398
Yes, it is an upstream issue, I already know that.
We read metadata from Android MediaStore, the database maintained by Android OS; except for Song Detail / Tag Editor, we can use a third party library (jaudiotagger) to read tags.
Currently, we do not maintain our own database reading metadata from third party libraries. So this problem could be solved instantly. (That's why this issue was put aside for almost half year.)
There DO be a plan to maintain an independently database. It was started almost 2 years ago; but due to historic burden of Phonograph codebase, it is twisting and has no notable progress now.
I plan to handle this issue after setting up the database. So the issue would not have any progress for a long time.