banana_split icon indicating copy to clipboard operation
banana_split copied to clipboard

Find a way to fit more info into QR codes

Open kirushik opened this issue 3 years ago • 13 comments

Current maximum size of QR payload is limited by the truncation which happens in the QR generation library after certain size of the payload. If we choose a library which allows larger QR codes (and then making sure that the QR reading library also supports those sizes), we might increase the maximum size fitting to a A4 page significantely. We should also investigate compression, QR encoding alphabets and redundancy options.

(Meaningful size target here would be ascii-armored 4096 bit GPG RSA key.)

kirushik avatar Aug 03 '20 21:08 kirushik

Meaningful size target here would be ascii-armored 4096 bit GPG RSA key

I don't think that's possible.

The maximum QR code capacity is 2953 bytes (Wiki)

Size of a 4096 bit GPG RSA key is almost twice as much (In my tests I got 4908 bytes).

Each QR shard is the same size as an original payload. (secrets.js-grempe doc). In other words we can't put more data using more shards.

prybalko avatar Sep 29 '21 13:09 prybalko

@prybalko well, the actual export of gpg --export-secret-keys is quite redundant, and there's an option which is intended to limit it: --export-options export-minimal. Unfortunately, there's a bug and that option doesn't work.

So there's a http://www.jabberwocky.com/software/paperkey/ project, which is intended to strip down the GPG exported output to the slimmest meaningful payload. And for me gpg --export-secret-key <4096 bit key>|paperkey --output-type raw|wc -c gives 2631 bytes. So I guess we would be able to fit into the constraints — provided that we provide some precise instructions on how to do that.

kirushik avatar Sep 29 '21 13:09 kirushik

Each QR shard is the same size as an original payload. (secrets.js-grempe doc). In other words we can't put more data using more shards.

Why was this done this way? If you use lagrange interpolation then a k-of-n deployment yields k * 2953 bytes

You need some redundancy to remain zero knowledge, but you could make the output scale linearly in the input size, maybe (k/2) * 2953 bytes or something, not 100% sure right now.

burdges avatar Sep 29 '21 20:09 burdges

@burdges do you mean the length of our plaintext leaking via the length of the cyphertext, or some other leak vector I don't get at 8pm on Friday? If it's about plaintext length, we'd better just pad the input to some discreet values.

I'd really-really prefer not to mess with secrets.js-grempe, the idea for BananaSplit it to be a minimal codebase on top of already-audited primitives...

kirushik avatar Oct 01 '21 18:10 kirushik

you could make the output scale linearly in the input size

@burdges could you elaborate how? Output is a value of the polynomial at a particular point. k determines the degree of the polynomial and n is the number of points on it.

prybalko avatar Oct 04 '21 08:10 prybalko

And for me gpg --export-secret-key <4096 bit key>|paperkey --output-type raw|wc -c gives 2631 bytes.

--output-type raw outputs binary data, in base64 it's going to be 33% larger and won't fit into QR.

Also, we loose additional 33% on conversion from binary secrets.js-grempe output to base64 string for the QR generating library. So realistically we can store around 2000 bytes

prybalko avatar Oct 04 '21 08:10 prybalko

It's more another project than this one but..

Reed-Solomn and Shamir secret sharing are both Lagrange interpolation. Reed-Solomn encodes only real data, so if you've only some shares and the data obeys some known distribution, then you learn something about the distribution of the original data. You presumably use Shamir secret sharing here because it addresses this weakness by encoding k-1 random strings. It cannot have shares smaller than the original data because it's encoding so much randomness. In principle, one could find some middle ground where you encode enough randomness that nothing really leaks, but messy.

In fact, the best approach is something else:

  • encrypt your data with a random symmetric key x,
  • k-of-n Reed-Solomn share the encrypted data, and
  • k-of-n Shamir secret share the symmetric key x.

Now each shard is like data_length / k + 48 bytes, so 32 bytes for x and 16 bytes for a MAC. Actually Rogaway wrote some papers on doing secret sharing properly, which I think Fredric Jacob planed upon implementing, so really one should simply use that work, assuming it does some similar trick to this.

It sounds like another project, or really an add one for Fred's project, like I said..

burdges avatar Oct 04 '21 08:10 burdges

@prybalko those are valid places to lose the compression rate — but neither sounds like something which cannot be solved. For example, we might detect if the secret's contents are base64-encoded and decode it before putting into the QR?

I totally get how this is a bigger of an undertaking than expected, so wouldn't mind if we just settle on the concrete plan for this for now (say, a checklist as a part of this issue) and not include this issue in the next release's milestone.

kirushik avatar Oct 04 '21 11:10 kirushik

@burdges oh, I like the trick with RS-encoding the encrypted data. funnily enough, we already symmetrically-encrypt the shard data (https://github.com/paritytech/banana_split/blob/19bc8b3da7726e6f4ae11573742d2a0c10619c66/src/util/crypto.ts#L110), just to protect against malicious printers (which are in our threat model).

Currently we (ask users to) put the full decryption passphrase in handwriting onto the printouts — but if we salt that seedphrase and shamir-share that additional salt I think we'd be good. Not something I'd change in the nearest release, but maybe later down the road.

Do you have any links to the Rogaway's work in writing anywhere? I'd prefer to split that out in a separate issue.

kirushik avatar Oct 04 '21 11:10 kirushik

so wouldn't mind if we just settle on the concrete plan for this for now

Combining Reed-Solomon with Shamir secret share sounds like a big step towards increasing payload capacity. Not sure it's worth implementing the base64 trick right now.

prybalko avatar Oct 05 '21 08:10 prybalko

Again, one should look at Fredric Jacobs code https://github.com/SpinResearch/RustySecrets before running off on my off the cuff suggestion. ;)

It looks as if he never implemented Rogaway's paper though, not sure the situation. I'll try to ask him sometime.

burdges avatar Oct 05 '21 13:10 burdges

Unfortunately, it's borderline impossible to embed webassembly bytecode directly into our .html file (and being a single file is one of the important design constraints for BananaSplit). So we're confined in the JS land when it comes to dependencies.

RustySecrets doesn't do anything fancy with Reed-Solomon encoding, though — just a regular Shamir, but authenticated with signatures?

In BananaSplit authentication happens in the AEAD encryption mode (with the decryption phrase you copy to the shards in handwriting), having an extra authenticity layer doesn't seem that important to me here...

kirushik avatar Oct 05 '21 17:10 kirushik

I strangely see nothing there about Rogaway's paper either.. https://eprint.iacr.org/2020/800

burdges avatar Oct 05 '21 17:10 burdges