exif-parser icon indicating copy to clipboard operation
exif-parser copied to clipboard

wrong encoding on exif strings (bug)

Open acidicX opened this issue 8 years ago • 11 comments

Hey there,

first things first: thanks for the module!

I'm having encoding problems with certain EXIF strings, e.g. the ImageDescription

This is what other EXIF readers tell me: Fahrt über den Ozean
but I am getting this with exif-parser: Fahrt C<ber den Ozean

Seems to me like they are not read with the proper encoding.

I'd fix it myself, but I have no idea how the tags are encoded in the first place :/

Cheers, acidicX

acidicX avatar Dec 20 '15 21:12 acidicX

This should be hard.. I expect that this library expects UTF8 as encoding.. but your example looks like some 8bit encoding is used instead? @bwindels Can you describe gently how this works? Thanks!

langpavel avatar Jul 11 '16 02:07 langpavel

I'm not sure if UTF-8 is in the EXIF specs. The image descriptions were edited by Adobe Lightroom, but Adobe sometimes hates standards (check the SVG export of Illustrator :-1: ).

acidicX avatar Jul 15 '16 07:07 acidicX

Hi I figured this out. Using ArrayBuffer, DataView and TextDecoder API with polyfill you can read UTF-8 strings. And yes, UTF-8 is best choice, backward compatible with ASCII and confirmed for Czech..

langpavel avatar Jul 15 '16 13:07 langpavel

@langpavel I just found that the bug still exists. Did you fork the lib to resolve it? seems that PRs are not actively worked on anymore... or did you find a better lib?

acidicX avatar Nov 29 '16 19:11 acidicX

Hi, I have no time to work on this, sorry..

langpavel avatar Dec 03 '16 12:12 langpavel

EXIF assumes ASCII and doesn't have a field to specify an encoding, so without using an encoding detection library, this will be hard to do. Since this library needs to work in the browser as well as in node.js, I'd be hesitant to add a big thing like encoding detection to it.

bwindels avatar Jul 09 '17 08:07 bwindels

More info here as well: https://stackoverflow.com/questions/19284205/safe-to-use-utf8-decoding-for-exif-property-marked-as-ascii

bwindels avatar Jul 09 '17 08:07 bwindels

I did notice that on node.js, the library forcefully decodes using ASCII, while in the browser it uses UTF16 (Compatible with ASCII). Ideally, on both platforms it should decode using UTF-8, since that's what's most widely used and compatible with ASCII as well. Your example text might be encoded with UTF-8 as a matter of fact. Browser support for UTF-8 is not ubiquitous, so might be hard to do cross-platform, I'll have a look.

bwindels avatar Jul 09 '17 09:07 bwindels

Released 0.1.11 that uses utf-8 for nodejs. If you want, you could test if the description in your image decodes properly now on nodejs. For the browser, we'd have to use TextEncoder if supported, and revert to fromCodePoint and fromCharCode if not. Don't have time to do this right now, but you're welcome to make a PR.

bwindels avatar Jul 09 '17 09:07 bwindels

@acidicX if this issue is still a problem for you could you share the image that causes you problems so others can look at it and try to fix/suggest changes?

SergioCrisostomo avatar Oct 06 '17 07:10 SergioCrisostomo

TextDecoder seems reasonably supported nowadays. Worth a look at some point.

bwindels avatar Jul 26 '18 21:07 bwindels