instascan
instascan copied to clipboard
Encoding issues / Umlaut is not decoded correctly
I have trouble decoding the QR code from this PDF (on page 27).
It seems the Umlaut in the last line is not decoded correctly. Screenshot from the live demo:
The last line should read ..."für Gartenarbeit und Entsorgung"...
I can decode the QR Code just fine in Java using ZXing. If I set the the CHARACTER_SET decoding hint to "ISO-8859-1" the decoded result is exactly the same as pictured in the screenshot, so I suspect that somewhere ISO-8859-1 is assumed in InstaScan.
Here's the QR Code I used for easier copy/pasting:
Is there a way to specify the encoding to use, or is this a bug?
In PHP, use: utf8_decode Thsi converts the string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1
In javascript, the following to to the same:
var decoded_content = self.utf8_decode(content); self.scans.unshift({ date: +(Date.now()), content: decoded_content });
utf8_decode: function (str_data) { // Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1 var string = "", i = 0, c = c1 = c2 = 0;
while ( i < str_data.length ) {
c = str_data.charCodeAt(i);
if (c < 128) {
string += String.fromCharCode(c);
i++;
} else if((c > 191) && (c < 224)) {
c2 = str_data.charCodeAt(i+1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
} else {
c2 = str_data.charCodeAt(i+1);
c3 = str_data.charCodeAt(i+2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return string;
Having the same issue. Cyrillics are decoded into gibberish:
ÐаннÑй кÑпон ÑгенеÑиÑован
having same issues with korean language
Having the same issue. Cyrillics are decoded into gibberish:
�анн�й к�пон �гене�и�ован
Проблема с этом куске https://github.com/schmich/instascan/blob/b0f9519f2dd2a6661e67066d6ed678e621dd5ce2/src/scanner.js#L101 но я пока еще не разобрался как это пофиксить.
@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.
Though this might not get merged. In case somebody needs this fix, you can clone the repo, apply the fix yourself and rebuild the package with:
npm install
./node_modules/.bin/gulp release
The instascan.min.js
will appear in dist
directory.
@alekciy Thank you for the tip, I have added utf8 decoder in that line and it worked.
А если cp1251? Например, платежки по ГОСТ Р 56042-2014 формат ST00011. В идеале добавить бы детектор кодировки.
@alekciy I don't think there is a reliable way to detect text encoding, especially when it's CP encodings. It would probably be better to add an encoding parameter to the Scanner class.