pyexiftool icon indicating copy to clipboard operation
pyexiftool copied to clipboard

decoding to utf-8 issues

Open noah opened this issue 8 years ago • 3 comments

I am using your excellent library to extract EXIF from a largish repository of images (100k+). I've encountered an encoding-related issue. Basically exiftool returns a garbage tag value and it breaks the call to decode('utf-8') in execute_json().

If I'm reading it correctly, your code assumes that whatever it reads from exiftool will capable of being decoded to utf-8 (is valid JSON). But this does not seem to always be the case:

% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG
SerialNumber                    : #ທ.L.9.-.<.#K%
% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG > file
% cat -v test.json 
Serial Number                   : M-O;#M-`M-:M-^W.M--M-OM-ILM-i}.9.-M-..M-vM-^PM-=M-#<.M-^QM-dG#M-%K%
% exiftool -j -SerialNumber P3090087.JPG     
[{
  "SourceFile": "P3090087.JPG",
  "SerialNumber": "?;#ທ\u0008???L?}\u001F9\u000B-?\u001E<\u0014??G#?K%"
}]

Per the exiftool author, the fix for this seems to be to add the -b (binary output) flag to the call to Popen. This way base64-encoded strings are returned, which cannot trigger a unicode decoding error. Overall encoding is pretty tricky so I thought I'd post and see if you think this is a bug. If nothing else perhaps this will be useful to someone else with a similar problem. Let me know if you'd like further diagnostics.

noah avatar Jan 03 '17 16:01 noah

@noah Thanks a lot for the report. This library didn't get the attention it deserves for years now, but I hope to get back to it very soon. At first sight, exiftool -j yielding invalid JSON seems like a bug in ExifTool to me, but I'll have to take a closer look to be sure.

smarnach avatar Jan 08 '17 20:01 smarnach

@smarnach I know you know but there are 5 pull requests waiting for your attention.

rusq avatar Feb 12 '17 00:02 rusq

Here's a test PNG featuring a topless AI girl that works in exiftool but breaks Python wrappers that use text mode: breaks_exif_wrapper.zip

CTimmerman avatar Apr 28 '24 19:04 CTimmerman