ctypescrypto icon indicating copy to clipboard operation
ctypescrypto copied to clipboard

Difficult to use for binary data in py3

Open parneshraniga opened this issue 5 years ago • 5 comments

I have used the library with py2 and now I am moving my projects to py3. The bindings are designed to work with text data which was not a big issue with py2 where bytes and str are analogous but this fails in py3 where str is unicode.

I have hacked the code for CMS to make it work for binary data under py3 (basically assume that bytes are passed in and returned) and this can be explicitly converted to str by the user is need be. Happy to have a closer look and provide a non-hacky patch.

parneshraniga avatar Sep 04 '19 00:09 parneshraniga

Really CMS stands for "cryptrographic MESSAGE syntax" and messages typically are human readable. So, I don't think that it is good idea to interpret them as binary data by default.

So, if you are going to submit me patch, please make sure that it is able to have both str and bytes transparently for user (or may be not so transparently, i.e. using keyword args).

vbwagner avatar Sep 04 '19 03:09 vbwagner

Thanks @vbwagner. I am trying to implement some encryption for medical imaging files (ftp://medical.nema.org/medical/dicom/final/sup55_ft.pdf). Looking at the RFC 2630, it looks the byte arrays are inputs and outputs.

In any case, I will see if I can easily modify the code to preserve currently functionality and also get bytes out.

parneshraniga avatar Sep 04 '19 04:09 parneshraniga

Of course, you are right. Binary data ought to be supported by encryption/signature formats. But I have to think a bit how to make interface more consistent and easy to use,

I've already solved similar problems in ctypescrypto.bio and ctypescrypto.x509 modules. And it seems that support for binary and character data is not consistent across all ctypescrypto suibmodules. Now BIO accepts both character and binary data, cipher and digest - require binary, and CMS - text. Cipher returns binary data and CMS decrypt - text.

Serialization of CMS into DER format via str fuction is obviosly error in Python3. It goes unnoticed only because there is no tests for this module. DER format should be binary.

vbwagner avatar Sep 04 '19 06:09 vbwagner

@vbwagner. Thanks for the info. I have made some minor changes whereby decrypt return bytes if Binary flag is set and string otherwise. To support this, I have added a bytes function to CMSBase. However, it might be worth being more explicit with DER format similar to how the PEM function is explicit.

parneshraniga avatar Sep 05 '19 01:09 parneshraniga

I can confirm this, the __str__ method of CMSBase does not works in python3 as it does in python2, the error is:

File "/usr/local/lib/python3.6/dist-packages/ctypescrypto/cms.py", line 94, in __str__ return str(bio)
File "/usr/local/lib/python3.6/dist-packages/ctypescrypto/bio.py", line 57, in __unicode__ return self.__bytes__().decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 1: invalid start byte

While on python2 works as expected.

manuel-arguelles avatar Oct 18 '19 17:10 manuel-arguelles