asn1crypto icon indicating copy to clipboard operation
asn1crypto copied to clipboard

Encoding/decoding very large values

Open eukaryote opened this issue 10 years ago • 4 comments

Is there a way to use asn1crypto for encoding/decoding values that are too large to fit in memory? My use case is using asn1crypto.cms with very large encrypted data values.

Based on a quick search that found some resources like https://www.ietf.org/mail-archive/web/smime/current/msg01335.html, the "indefinite length encoding" and "constructed encoding" features of ASN1 look relevant.

eukaryote avatar Oct 27 '15 02:10 eukaryote

For these large values, are you decrypting and saving them to disk?

Currently asn1crypto was developed dealing with files that fit in memory. Most of the structures are being used in some form or another for X.509 certificates, CRLs, OCSP or PKCS#12 files.

I am thinking that for this to work, there would need to be a way to pass a file object to Asn1Value.load() instead of a byte string. Then, each object would copy any necessary header and footer bytes, but the contents would be a reference to the file object, plus an offset and length. That way the file contents would never need to be copied into memory for large chunks of data.

For serializing object via .dump(), there would need to be an optional file object parameter that would accept all of the output.

While the library can parse indefinite length values, I think the real challenge will be how to restructure the existing functionality to treat the .contents attribute of each Asn1Value object as something other than just a byte string.

wbond avatar Oct 27 '15 03:10 wbond

Yes, the decryption of the large values (things like EncryptedContent inside a CMS EncryptedContentInfo) happens incrementally, writing output to a temporary file and then verifying the authentication tag before doing anything else with the plaintext.

The EncryptedContent is optional in CMS, so I could store the ciphertext blob separately, which would avoid the need for serializing/deserializing large values with asn1crypto. I'm leaning towards that approach at the moment, which also has some other advantages for my purposes.

eukaryote avatar Oct 29 '15 13:10 eukaryote

An alternative would be to split the file into small chunks, a few kilobytes each. Info about the chunks would be encrypted with CMS. There is a tradeoff between the chunk size and the CMS size, maybe try to compute a chunk size which will equal the CMS size. By using chunks you won't need to store the complete decrypted file before you can use it. As soon as you decrypt+verifiy each chunk, you can use it.

joernheissler avatar Jul 01 '18 09:07 joernheissler

I have created this code to help with large CMS data https://github.com/chevah/asn1stream

It is a basic ASN1 parser (for now) which is enough to navigate the ASN1 structure.

For example, when I reach a cms.RecipientInfos I assume that the tag is small and will read the value in memory and parse it with cms.RecipientInfos.load(self._stream_decoder.dump(tag))

When I reach encrypted_content OctetString I will just iterate over the content chunks.

adiroiban avatar Aug 15 '20 14:08 adiroiban