Phonograph_Plus icon indicating copy to clipboard operation
Phonograph_Plus copied to clipboard

[Bug] Non-Latin1 characters in metadata are not always displayed correctly

Open l3u opened this issue 1 year ago • 2 comments

Bug Summary

When having files containing non-latin1 characters in tags, some of them (interestingly not all!) are not displayed correctly. See the "Screenshots" section below for an example.

Version

1.7.1 (from F-Droid)

Steps to reproduce the bug

Have some files containing non-latin1 characters in their metadata :-)

Screenshots or Screen recordings

Example: I have an album from "Sašo Avsenik und seine Oberkrainer". When I search for "Avsenik" in "Artists", I get two distinct matches, one with the real "š" character, and one with the unicode character displayed as a wrong encoding: Screenshot_20240623-091747

That one album is consequently split up, in:

Screenshot_20240623-091756

and

Screenshot_20240623-091810

When you open an UTF-8 text file containing "Sašo" using ISO-8859-1, you get "SaÅ¡o", so I think you use ISO-8859-1 here instead of UTF-8.

Other helpful information

All files are Opus files tagged via MusicBrainz. So the metadata is consistent across all. Here's the opusinfo output for two of the "different artist" files:

$ opusinfo "02. Wann kommst du zu mir.opus"
Processing file "02. Wann kommst du zu mir.opus"...

New logical stream (#1, serial: 29e0da3a): type opus
Encoded with libopus 1.4, libopusenc 0.2.1
User comments section follows...
        ...
        ALBUM=Polkaklang ein Leben lang!
        ALBUMARTIST=Sašo Avsenik und seine Oberkrainer
        ALBUMARTISTSORT=Avsenik, Sašo und seine Oberkrainer
        ARTIST=Sašo Avsenik und seine Oberkrainer
        ARTISTS=Sašo Avsenik und seine Oberkrainer
        ARTISTSORT=Avsenik, Sašo und seine Oberkrainer
        ...
        TITLE=Wann kommst du zu mir
        ...

and

$ opusinfo "07. Wenn Mädchen träumen.opus"
Processing file "07. Wenn Mädchen träumen.opus"...

New logical stream (#1, serial: 27b1e98e): type opus
Encoded with libopus 1.4, libopusenc 0.2.1
User comments section follows...
        ...
        ALBUM=Polkaklang ein Leben lang!
        ALBUMARTIST=Sašo Avsenik und seine Oberkrainer
        ALBUMARTISTSORT=Avsenik, Sašo und seine Oberkrainer
        ARTIST=Sašo Avsenik und seine Oberkrainer
        ARTISTS=Sašo Avsenik und seine Oberkrainer
        ARTISTSORT=Avsenik, Sašo und seine Oberkrainer
        ...
        TITLE=Wenn Mädchen träumen
        ...

My wild guess would be that it's not assumed that the vorbiscomment content is UTF-8 encoded (the tag values are always UTF-8, cf. the Opus documentation and/or the Ogg Vorbis documentation).

It seems like the program tries to guess the encoding, based on the song title: "Wann kommst du zu mir" is pure ASCII. Here, the "š" in the artist's name is letter salad. In contrast, "Wenn Mädchen träumen" contains the non-ASCII "ä" character, and here, we get the "š" correctly. All songs for the correctly parsed artist version contain non-ASCII characters, whereas all with the wrong encoding artist are pure ASCII.

Just guesswork though ;-)

l3u avatar Jun 23 '24 07:06 l3u

This seems to be an upstream issue with the underlying Android provided metadata extraction. I saw the very same problem using Phocid and also reported it there: https://github.com/TJYSunset/Phocid/issues/29#issuecomment-2555793444 .

A dev replied and advised me to enable "Advanced metadata extraction", which bypasses Android's functionality and used the player's own – and this fixed the problem.

Just fyi, I reported the issue upstream: https://issuetracker.google.com/issues/385155398

l3u avatar Dec 20 '24 09:12 l3u

Yes, it is an upstream issue, I already know that.

We read metadata from Android MediaStore, the database maintained by Android OS; except for Song Detail / Tag Editor, we can use a third party library (jaudiotagger) to read tags.

Currently, we do not maintain our own database reading metadata from third party libraries. So this problem could be solved instantly. (That's why this issue was put aside for almost half year.)

There DO be a plan to maintain an independently database. It was started almost 2 years ago; but due to historic burden of Phonograph codebase, it is twisting and has no notable progress now.

I plan to handle this issue after setting up the database. So the issue would not have any progress for a long time.

chr56 avatar Dec 20 '24 10:12 chr56