asn1crypto
asn1crypto copied to clipboard
Encoding/decoding very large values
Is there a way to use asn1crypto for encoding/decoding values that are too large to fit in memory? My use case is using asn1crypto.cms with very large encrypted data values.
Based on a quick search that found some resources like https://www.ietf.org/mail-archive/web/smime/current/msg01335.html, the "indefinite length encoding" and "constructed encoding" features of ASN1 look relevant.
For these large values, are you decrypting and saving them to disk?
Currently asn1crypto was developed dealing with files that fit in memory. Most of the structures are being used in some form or another for X.509 certificates, CRLs, OCSP or PKCS#12 files.
I am thinking that for this to work, there would need to be a way to pass a file object to Asn1Value.load() instead of a byte string. Then, each object would copy any necessary header and footer bytes, but the contents would be a reference to the file object, plus an offset and length. That way the file contents would never need to be copied into memory for large chunks of data.
For serializing object via .dump(), there would need to be an optional file object parameter that would accept all of the output.
While the library can parse indefinite length values, I think the real challenge will be how to restructure the existing functionality to treat the .contents attribute of each Asn1Value object as something other than just a byte string.
Yes, the decryption of the large values (things like EncryptedContent inside a CMS EncryptedContentInfo) happens incrementally, writing output to a temporary file and then verifying the authentication tag before doing anything else with the plaintext.
The EncryptedContent is optional in CMS, so I could store the ciphertext blob separately, which would avoid the need for serializing/deserializing large values with asn1crypto. I'm leaning towards that approach at the moment, which also has some other advantages for my purposes.
An alternative would be to split the file into small chunks, a few kilobytes each. Info about the chunks would be encrypted with CMS. There is a tradeoff between the chunk size and the CMS size, maybe try to compute a chunk size which will equal the CMS size. By using chunks you won't need to store the complete decrypted file before you can use it. As soon as you decrypt+verifiy each chunk, you can use it.
I have created this code to help with large CMS data https://github.com/chevah/asn1stream
It is a basic ASN1 parser (for now) which is enough to navigate the ASN1 structure.
For example, when I reach a cms.RecipientInfos I assume that the tag is small and will read the value in memory and parse it with cms.RecipientInfos.load(self._stream_decoder.dump(tag))
When I reach encrypted_content OctetString I will just iterate over the content chunks.