pigallery2 icon indicating copy to clipboard operation
pigallery2 copied to clipboard

Special characters displayed wrongly

Open grasdk opened this issue 1 year ago • 2 comments

Describe the bug

Danish letters Ææ, Øø and Åå and German ü - and probably other characters are not displayed correctly in the user interface when saved as "Region Person Display Name" or "Region Name", not sure which one is actually read. The metadata is added by DigiKam 8.1.0, but as far as I can gather, it is stored as UTF-8.

See screenshot and attached photo (borrowed from wikimedia)

exiftool output:

$ exiftool -codedcharacterset 2023-08-27-120000-example.jpg
Coded Character Set             : UTF8
$ exiftool -Region* -Keywords -XP* 2023-08-27-120000-example.jpg
Region Person Display Name      : Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå, RedHairedPerson SittingOnBench
Region Rectangle                : 0.611242, 0.453747, 0.0214017, 0.0391972, 0.226011, 0.500157, 0.019285, 0.0319849, 0.365945, 0.486359, 0.00470367, 0.00940734
Region Applied To Dimensions H  : 3189
Region Applied To Dimensions Unit: pixel
Region Applied To Dimensions W  : 4252
Region Area H                   : 0.0391972, 0.0319849, 0.00940734
Region Area Unit                : normalized, normalized, normalized
Region Area W                   : 0.0214017, 0.019285, 0.00470367
Region Area X                   : 0.621943, 0.235654, 0.368297
Region Area Y                   : 0.473346, 0.516149, 0.491063
Region Name                     : Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå, RedHairedPerson SittingOnBench
Region Type                     : Face, Face, Face
Keywords                        : Holiday, RedHairedPerson SittingOnBench, Person Carrying ChildInYellowDress, Pærsøn Åkessün Æñtestå
XP Keywords                     : Holiday;RedHairedPerson SittingOnBench;Person Carrying ChildInYellowDress;Pærsøn Åkessün Æñtestå

Photo/video (optional) that causes the bug

2023-08-27-120000-example

Screenshot

image

Note how the Keywords or XP Keywords are displayed correctly

Used app version:

  • docker:latest

grasdk avatar Dec 01 '23 23:12 grasdk

Did some further testing. Saved more person-metadata to the using "Tag That Photo". This makes the display correct in PiGallery.

2023-08-27-120000-example-ttp

Once the data is rewritten by exiftool, PiGallery displays it wrongly. This goes for both exiftool Windows executable and the ubuntu version under WSL

WSL (ubuntu)

$ cp 2023-08-27-120000-example-ttp.jpg 2023-08-27-120000-example-ttp-exifcopy.jpg
$ exiftool -all= -tagsfromfile @ -all:all -IPTC:All -XMP:All -ColorSpaceTags -F -codedcharacterset=utf8 2023-08-27-120000-example-ttp-exifcopy.jpg

2023-08-27-120000-example-ttp-exifcopy

cmd.exe (windows 10)

>copy 2023-08-27-120000-example-ttp.jpg 2023-08-27-120000-example-ttp-exifwincopy.jpg
>exiftool -all= -tagsfromfile @ -all:all -IPTC:All -XMP:All -ColorSpaceTags -F -codedcharacterset=utf8 2023-08-27-120000-example-ttp-exifwincopy.jpg

2023-08-27-120000-example-ttp-exifwincopy

When sorting and comparing the exif data as displayed by exiftool, there are no differences.

This is confusing, because I think "Tag That Photo" uses exiftool under the hood

grasdk avatar Dec 03 '23 00:12 grasdk

I had the chance to play around a bit.

Converting variable "name" in line 487 of MetaDataLoader.ts from Ascii to utf-8 at least seems to fix the problem when viewed in the log. Without this conversion the same wrong characters show up in the log, as show up in the UI https://github.com/bpatrik/pigallery2/blob/3489f1d55ad4b7a5e83149887c665f7a5beddef0/src/backend/model/fileaccess/MetadataLoader.ts#L487C18-L487C18

				Logger.info(LOG_TAG, 'name:                                     ' + name);
				Logger.info(LOG_TAG, 'name converted from ascii to utf-8:       ' + Buffer.from(name, 'ascii').toString('utf-8'));
				Logger.info(LOG_TAG, 'name converted from ascii to utf-8 twice: ' + Buffer.from(Buffer.from(name, 'ascii').toString('utf-8'), 'ascii').toString('utf-8'));

the output is: image

So it could be that the library that reads the metadata assumes that it is ascii-encoded, which is why the conversion works. According to https://exiftool.org/TagNames/MWG.html, the MWG group recommends ASCII, but exiftool uses UTF-8. This may be the cause of the assumed ASCII format.

Contrary to the EXIF specification, the MWG recommends that EXIF "ASCII" string values be stored as UTF-8. To honour this, the exiftool application sets the default internal EXIF string encoding to "UTF8" when the MWG module is loaded, but via the API this must be done manually by setting the CharsetEXIF option.

I'm not yet comfortable enough with the code to suggest a solution and create pull request with a correction, but wanted to share my findings.

grasdk avatar Dec 13 '23 23:12 grasdk

Fixed with https://github.com/bpatrik/pigallery2/pull/826

grasdk avatar Mar 21 '24 12:03 grasdk