pyexiftool
pyexiftool copied to clipboard
decoding to utf-8 issues
I am using your excellent library to extract EXIF from a largish repository of images (100k+). I've encountered an encoding-related issue. Basically exiftool returns a garbage tag value and it breaks the call to decode('utf-8')
in execute_json()
.
If I'm reading it correctly, your code assumes that whatever it reads from exiftool will capable of being decoded to utf-8 (is valid JSON). But this does not seem to always be the case:
% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG
SerialNumber : #ທ.L.9.-.<.#K%
% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG > file
% cat -v test.json
Serial Number : M-O;#M-`M-:M-^W.M--M-OM-ILM-i}.9.-M-..M-vM-^PM-=M-#<.M-^QM-dG#M-%K%
% exiftool -j -SerialNumber P3090087.JPG
[{
"SourceFile": "P3090087.JPG",
"SerialNumber": "?;#ທ\u0008???L?}\u001F9\u000B-?\u001E<\u0014??G#?K%"
}]
Per the exiftool author, the fix for this seems to be to add the -b
(binary output) flag to the call to Popen
. This way base64-encoded strings are returned, which cannot trigger a unicode decoding error. Overall encoding is pretty tricky so I thought I'd post and see if you think this is a bug. If nothing else perhaps this will be useful to someone else with a similar problem. Let me know if you'd like further diagnostics.
@noah Thanks a lot for the report. This library didn't get the attention it deserves for years now, but I hope to get back to it very soon. At first sight, exiftool -j
yielding invalid JSON seems like a bug in ExifTool to me, but I'll have to take a closer look to be sure.
@smarnach I know you know but there are 5 pull requests waiting for your attention.
Here's a test PNG featuring a topless AI girl that works in exiftool but breaks Python wrappers that use text mode: breaks_exif_wrapper.zip