exif-parser
exif-parser copied to clipboard
wrong encoding on exif strings (bug)
Hey there,
first things first: thanks for the module!
I'm having encoding problems with certain EXIF strings, e.g. the ImageDescription
This is what other EXIF readers tell me:
Fahrt über den Ozean
but I am getting this with exif-parser:
Fahrt C<ber den Ozean
Seems to me like they are not read with the proper encoding.
I'd fix it myself, but I have no idea how the tags are encoded in the first place :/
Cheers, acidicX
This should be hard.. I expect that this library expects UTF8 as encoding.. but your example looks like some 8bit encoding is used instead? @bwindels Can you describe gently how this works? Thanks!
I'm not sure if UTF-8 is in the EXIF specs. The image descriptions were edited by Adobe Lightroom, but Adobe sometimes hates standards (check the SVG export of Illustrator :-1: ).
Hi I figured this out. Using ArrayBuffer
, DataView
and TextDecoder
API with polyfill you can read UTF-8 strings. And yes, UTF-8 is best choice, backward compatible with ASCII and confirmed for Czech..
@langpavel I just found that the bug still exists. Did you fork the lib to resolve it? seems that PRs are not actively worked on anymore... or did you find a better lib?
Hi, I have no time to work on this, sorry..
EXIF assumes ASCII and doesn't have a field to specify an encoding, so without using an encoding detection library, this will be hard to do. Since this library needs to work in the browser as well as in node.js, I'd be hesitant to add a big thing like encoding detection to it.
More info here as well: https://stackoverflow.com/questions/19284205/safe-to-use-utf8-decoding-for-exif-property-marked-as-ascii
I did notice that on node.js, the library forcefully decodes using ASCII, while in the browser it uses UTF16 (Compatible with ASCII). Ideally, on both platforms it should decode using UTF-8, since that's what's most widely used and compatible with ASCII as well. Your example text might be encoded with UTF-8 as a matter of fact. Browser support for UTF-8 is not ubiquitous, so might be hard to do cross-platform, I'll have a look.
Released 0.1.11 that uses utf-8 for nodejs. If you want, you could test if the description in your image decodes properly now on nodejs. For the browser, we'd have to use TextEncoder if supported, and revert to fromCodePoint and fromCharCode if not. Don't have time to do this right now, but you're welcome to make a PR.
@acidicX if this issue is still a problem for you could you share the image that causes you problems so others can look at it and try to fix/suggest changes?
TextDecoder
seems reasonably supported nowadays. Worth a look at some point.