koder icon indicating copy to clipboard operation
koder copied to clipboard

Support for decoding binary-encoded QRs?

Open JL102 opened this issue 8 months ago • 3 comments

Is your feature request related to a problem? Please describe. Yes - As far as I can tell, when scanning a binary-encoded QR code, it's converted into a string, and the bytes do not match up when I attempt to decode it.

Describe the solution you'd like In an ideal world, it'd be great if Koder could automatically detect when a qr code is binary-encoded, and automatically return a byte array / UInt8Array. But another nice option would be if Koder simply always outputted a byte array, because you can always use a TextDecoder to convert it to a string later in the pipeline.

Describe alternatives you've considered

  1. I tried using TextEncoder to encode the returned string back into a Uint8Array, but the bytes were completely different. Putting an example at the bottom of the issue.
  2. I've tried delving into the code and modifying getScanResults() to return an array of uint8_t* instead of str* without using strcpy, and then modifying koder.js to copy the pointer as a JS Array / UInt8Array, but I got stuck on the last part. I couldn't find any methods related to copying an array pointer back to JS in Emscripten's preamble, but I found this stackoverflow post and it went WAY over my head. I've never done anything with WebAssembly and my experience with C++ is extremely limited, so this is a little above my pay grade, so to speak.

Additional context I'm writing a mobile web app that's designed to use QR codes to transmit data between devices. Since the QR codes are being both created and scanned in-app, we have the benefit of the scanner "expecting" the qr code to be encoded in a specific format. I'm using the NodeJS package qrcode to encode my qr codes. When compressing my data, I'm using LZMA, which already outputs an array of bytes. When I have to encode that array of bytes to Base64, it costs an additional 33% of data size, so it'd be ideal if I could just encode it in binary and then decode it as binary. According to the readme of qrcode (https://www.npmjs.com/package/qrcode#binary-data), converting binary data to a JS string adds extra bytes, which I assume is the reason why the bits in the outputted string look completely different from the original data.

Example of using TextEncoder to try and recover the original bytes: Original Uint8Array:

221, 128, 128, 128, 130, 48, 131, 128, 128, 128, 128, 128, 128, 128, 189, 8, 7, 98, 161, 94, 30, 168, 143, 80, 172, 15, 239, 93, 198, 149, 251, 171, 84, 103, 81, 243, 211, 208, 147, 195, 233, 145, 36, 205, 135, 118, 48, 202, 120, 129, 152, 65, 109, 218, 36, 249, 121, 153, 165, 158, 18, 205, 209, 250, 171, 153, 126, 126, 83, 118, 236, 74, 185, 112, 171, 179, 196, 92, 170, 222, 104, 147, 56, 229, 229, 42, 55, 210, 235, 47, 151, 238, 63, 206, 178, 21, 4, 89, 213, 170, 164, 99, 27, 114, 136, 208, 30, 71, 208, 248, 159, 171, 55, 53, 238, 72, 195, 1, 175, 140, 205, 94, 106, 75, 189, 130, 20, 254, 159, 50, 35, 236, 12, 60, 92, 226, 164, 249, 28, 72, 42, 3, 95, 249, 77, 139, 204, 23, 231, 208, 103, 244, 140, 80, 240, 197, 199, 17, 201, 79, 32, 167, 27, 0, 194, 98, 104, 217, 243, 161, 103, 97, 240, 90, 190, 191, 48, 137, 241, 197, 140, 114, 201, 210, 120, 199, 71, 116, 234, 27, 86, 154, 186, 146, 217, 196, 232, 254, 150, 40, 3, 244, 60, 126, 141, 180, 44, 4, 33, 82, 211, 220, 237, 69, 247, 22, 45, 25, 221, 242, 232, 210, 195, 130, 249, 0, 87, 101, 93, 57, 89, 162, 19, 238, 202, 144, 176, 165, 29, 68, 51, 137, 236, 206, 30, 82, 206, 194, 11, 86, 151, 50, 82, 91, 186, 8, 107, 186, 60, 241, 8, 207, 152, 133, 18, 55, 231, 95, 213, 158, 136, 45, 173, 252, 48, 40, 130, 64, 103, 78, 88, 81, 77, 180, 68, 86, 6, 111, 151, 192, 212, 95, 55, 179, 224, 223, 25, 224, 68, 166, 8, 105, 212, 143, 37, 145, 171, 176, 209, 128, 223, 160, 201, 121, 138, 242, 66, 118, 162, 50, 232, 182, 172, 181, 8, 209, 217, 154, 229, 183, 186, 52, 65, 101, 45, 42, 111, 175, 8, 145, 253, 26, 108, 88, 133, 29, 47, 194, 64, 75, 156, 220, 46, 14, 40, 246, 159, 109, 97, 217, 166, 40, 31, 187, 40, 103, 157, 134, 68, 238, 237, 140, 106, 68, 96, 250, 140, 83, 194, 2, 98, 102, 62, 153, 32, 178, 241, 12, 224, 172, 25, 94, 241, 23, 173, 178, 40, 135, 66, 42, 212, 214, 19, 255, 143, 159, 255, 192, 86, 24, 5, 248, 59, 217, 163, 94, 146, 173, 17, 144, 27, 190, 47, 105, 187, 132, 141, 148, 155, 208, 169, 20, 172, 37, 96, 151, 17, 227, 214, 233, 138, 98, 136, 112, 156, 167, 204, 167, 197, 62, 173, 189, 97, 203, 71, 227, 70, 13, 93, 161, 172, 231, 123, 113, 77, 62, 37, 119, 50, 79, 11, 150, 208, 218, 19, 223, 125, 126, 182, 98

Resulting string:

'Ý\x80\x80\x80\x820\x83\x80\x80\x80\x80\x80\x80\x80½\b\x07b¡^\x1E¨\x8FP¬\x0Fï]Æ\x95û«TgQóÓÐ\x93Ãé\x91$Í\x87v0Êx\x81\x98AmÚ$ùy\x99¥\x9E\x12ÍÑú«\x99~~SvìJ¹p«³Ä\\ªÞh\x938åå*7Òë/\x97î?β\x15\x04YÕª¤c\x1Br\x88Ð\x1EGÐø\x9F«75îHÃ\x01¯\x8CÍ^jK½\x82\x14þ\x9F2#ì\f<\\â¤ù\x1CH*\x03_ùM\x8BÌ\x17çÐgô\x8CPðÅÇ\x11ÉO §\x1B'

Result of new TextEncoder().encode(str):

195, 157, 194, 128, 194, 128, 194, 128, 194, 130, 48, 194, 131, 194, 128, 194, 128, 194, 128, 194, 128, 194, 128, 194, 128, 194, 128, 194, 189, 8, 7, 98, 194, 161, 94, 30, 194, 168, 194, 143, 80, 194, 172, 15, 195, 175, 93, 195, 134, 194, 149, 195, 187, 194, 171, 84, 103, 81, 195, 179, 195, 147, 195, 144, 194, 147, 195, 131, 195, 169, 194, 145, 36, 195, 141, 194, 135, 118, 48, 195, 138, 120, 194, 129, 194, 152, 65, 109, 195, 154, 36, 195, 185, 121, 194, 153, 194, 165, 194, 158, 18, 195, 141, 195, 145, 195, 186, 194, 171, 194, 153, 126, 126, 83, 118, 195, 172, 74, 194, 185, 112, 194, 171, 194, 179, 195, 132, 92, 194, 170, 195, 158, 104, 194, 147, 56, 195, 165, 195, 165, 42, 55, 195, 146, 195, 171, 47, 194, 151, 195, 174, 63, 195, 142, 194, 178, 21, 4, 89, 195, 149, 194, 170, 194, 164, 99, 27, 114, 194, 136, 195, 144, 30, 71, 195, 144, 195, 184, 194, 159, 194, 171, 55, 53, 195, 174, 72, 195, 131, 1, 194, 175, 194, 140, 195, 141, 94, 106, 75, 194, 189, 194, 130, 20, 195, 190, 194, 159, 50, 35, 195, 172, 12, 60, 92, 195, 162, 194, 164, 195, 185, 28, 72, 42, 3, 95, 195, 185, 77, 194, 139, 195, 140, 23, 195, 167, 195, 144, 103, 195, 180, 194, 140, 80, 195, 176, 195, 133, 195, 135, 17, 195, 137, 79, 32, 194, 167, 27

I can't find any patterns in this result that indicate any resemblance to the original data. Additionally, the array is shorter and I don't know why that would be.

JL102 avatar Nov 07 '23 20:11 JL102

@JL102 Can you provide a QR code image?

maslick avatar Nov 07 '23 21:11 maslick

@JL102 Can you provide a QR code image?

@maslick Thanks for such a quick response! Sure. Here's the QR code with the binary data I included as the example: image

Side note: I looked around for a native app that can scan binary data, and it looks like the app Binary Eye scans the binary data correctly (after you convert the data to hex): image

JL102 avatar Nov 07 '23 21:11 JL102

I should note one thing: During my research the other day, I found this stackoverflow post: https://stackoverflow.com/questions/37996101/storing-binary-data-in-qr-codes and in the person's notes, they said ZBar can't handle null bytes. If this is the case, then maybe in the generation side, I could manually replace null bytes with something else, like idk, five 01's in a row or something.

JL102 avatar Nov 08 '23 21:11 JL102