PKI.js Support streams in CMS EnvelopedData and SignedData

I'd like to encrypt/decrypt and sign/verify potentially large plaintexts, which means I'd have to use readable streams instead of ArrayBuffer instances, but PKI.js doesn't support that today as far as I can tell.

Would you be open to supporting streams if someone else were to implement it in a PR? I suspect you won't because that'd mean not using WebCrypto (see: https://github.com/w3c/webcrypto/issues/73), but I'm checking just in case as I might not have an option but to create my own implementation of EnvelopedData and SignedData, which I'd love to avoid.

In case you'll entertain the thought of supporting streams, these are other consequences to bear in mind off the top of my head:

EnvelopedData couldn't contain the ciphertext -- instead, we'd have to expose it in a new readable stream.
SignedData won't be able to use vanilla Ed25519/Ed448 because they don't support streaming, but we could use their pre-hash variants.

Jul 11 '19 12:07 gnarea

@gnarea As you correctly stated PKI.js based (initially) on top of WebCrypto API which does not support streaming. Theoretically it is possible to build a "stream-friendly" version of custom crypto engine, but at the moment only SignedData supports crypto engine's functionality. Making EnvelopedData to use crypto engine is in a future plan.

Jul 11 '19 12:07 YuryStrozhevsky

Could you propose how you would approach such a PR.

Jul 11 '19 14:07 rmhrisk

Thanks for the prompt replies!

I must admit Node.js is the only platform I'm targeting and I didn't look into how it'd work in the browser until just now. From the relatively quick research I did just now, it seems like node-forge offers the only stream-based implementations of AES and SHA-2 in the browser (and I need to double check it really supports stream-based AES encryption -- I only found examples of stream-based decryption).

So, assuming that there's a library that offers stream-based implementations of AES and SHA-2 functions in the browser:

Generally, I'd try to keep any changes to EnvelopedData.encrypt() and SignedData.sign() to a minimum because (a) they should continue to use WebCrypto when available for performance reasons and (b) we want to avoid introducing regressions.

I'd probably create a custom crypto engine (or extend a preexisting one) to expose new functions that do encryption/decryption and hashing with streams. For performance reasons, I'd also be tempted to have it use Node.js' crypto lib if possible, or node-forge/etc when running on the browser.

I'd then create methods analogous to the methods above, which receive the plaintext as a readable stream. Here's how they'd work:

Stream-based encryption

const plaintext = fs.createReadStream('file.txt');
const ciphertextOutput = fs.createWriteStream('file.txt.encrypted');

const envelopedData = new EnvelopedData({...});
const encrypt = envelopedData.encryptStream({ name: 'AES-GCM', length: 128 })

encrypt.on('finish', () => {
  const contentInfo = new pkijs.ContentInfo({
    content: envelopedData.toSchema(),
    contentType: '1.2.840.113549.1.7.3'
  });
  fs.writeFileSync('enveloped-data.der', contentInfo.toSchema().toBER(false));
});

plaintext.pipe(encrypt).pipe(ciphertextOutput)

Behind the scenes, EnvelopedData.encryptStream() could be calling EnvelopedData.encrypt() with a dummy/empty plaintext to populate envelopedData accordingly, and it'd then delete encryptedContentInfo.encryptedContent to detach the invalid ciphertext. This feels like an ugly hack though -- A better but more invasive alternative is to move all the code that populates envelopedData to a separate (protected) method.

Stream-based signature

const plaintext = fs.createReadStream('file.txt');

const signedData = new SignedData({...});
await signedData.signStream(privateKey, 0, 'SHA-256', plaintext)

const contentInfo = new pkijs.ContentInfo({
  content: envelopedData.toSchema(),
  contentType: '1.2.840.113549.1.7.2'
});
fs.writeFileSync('signed-data.der', contentInfo.toSchema().toBER(false));

And behind the scenes, SignedData.signStream() would calling SignedData.sign() with no data argument after calculating the digest of the input stream and putting it in signedData.signerInfos[0].signedAttrs.

Let me know what you think.

Jul 11 '19 16:07 gnarea

@gnarea You do not need to invent your own API because in case you are using crypto engine's functionality API already exists. In case you are using Node.js only it would be even simplier: there is already made Node-specific crypto engine. As for EnvelopedData: it is very hard to abstract encryption process from low-level encryption engine. Probably I will have a time in a nearest future.

Jul 12 '19 05:07 YuryStrozhevsky

Thanks for the pointers @YuryStrozhevsky. I hadn't come across the NodeEngine.js file before.

Can you please elaborate on what you mean by the API already exists? CryptoEngine.(encrypt|decrypt) only works with promises, so we'd still be holding the plaintext and ciphertext in memory. And we couldn't use those methods to encrypt/decrypt individual chunks of the plaintext/ciphertext because the resulting chunks couldn't be concatenated.

The same applies to CryptoEngine.digest, although that isn't used by SignedData directly, so the developer could calculate the digest using streams before initialising SignedData.

Jul 12 '19 08:07 gnarea

@gnarea You need to inspect all functions from CryptoEngine class. The encrypt/decrypt/sign/verify is a kind of helpers for more advance API functions like signWithPrivateKey, verifyWithPublicKey and other functions whose would be used in SignedData class. Also you can always re-design usage of data parameter: for example you could set data = { file: "path" } and make your own digest function handling such variant of data (it is only an example on how to make "problem-specific" engines).

Jul 12 '19 08:07 YuryStrozhevsky

I was aware of the way SignedData used CryptoEngine.(signWithPrivateKey|verifyWithPrivateKey), and how they in turn use CryptoEngine.(sign|verify), but I'm not sure how those methods in CryptoEngine relate to supporting streams -- We don't have to change them at all, right? When it comes to supporting digital signatures with streams, the only problem is the digest().

As for having the data argument take a different form, such as data = { file: "path" }, I'm happy with that. But just to be clear, you're only suggesting to do that with encrypt, decrypt and digest, right?

Jul 12 '19 09:07 gnarea

Actually, I just realised that SignedData could potentially sign the plaintext -- I thought it only signed the plaintext's digest. Would you object to an implementation that only worked with digests?

Jul 12 '19 09:07 gnarea

@gnarea The SignedData could have attached data (signature attached to data). In this case you must use entire data, not only digest. So it is not an option to have only digest there.

Jul 12 '19 10:07 YuryStrozhevsky

@YuryStrozhevsky, I forgot about that scenario -- I was talking about the scenario where the data is detached and you have to sign the whole plaintext (not its signature).

In the scenario you just described, where the data is attached and the whole plaintext has to be signed, you don't need the stream-based functionality: The plaintext is already loaded in memory anyway and today's implementation will work just fine.

Jul 12 '19 12:07 gnarea

@gnarea PKIjs is an universal library and we need to implement all kind of scenarios. I think that in your case could be better to create your own class extending SignedData. There you could easily substitute necessary code by what you need. I mean you could extends SignedData and there describe only sign function - all othere functions would remain same.

Jul 12 '19 12:07 YuryStrozhevsky

That works for me.

So I guess this issue is now limited to EnvelopedData. Let me know if you'd like me to update the OP accordingly.

Jul 12 '19 13:07 gnarea

PKI.js PKI.js copied to clipboard

Support streams in CMS EnvelopedData and SignedData

Stream-based encryption

Stream-based signature

PKI.js
PKI.js copied to clipboard