pycryptodome icon indicating copy to clipboard operation
pycryptodome copied to clipboard

Allowing unicode character compatibility for PBKDF2

Open therealOri opened this issue 2 years ago • 2 comments

Changed utf-8 to unicode-escape in the bcrypt function and allowed "password" and "salt" in the PBKDF2 function to be compatible with unicode character encoding.

I did this because the passwords/keys and salts I am generating use unicode characters and while trying to encrypt data (using oCrypt0r) it kept spitting out the following error.

UnicodeEncodeError: 'latin-1' codec can't encode character '\u12a0' in position 0: ordinal not in range(256)

It was after I decided to take a look at the PBKDF2 function in the KDF.py file to see that the tobytes() function wasn't using an encoding and that the bcrypt() function had a tobytes() function with utf-8 encoding. So I decided to give the tobytes() function in bcrypt() and the 2 others in PBKDF2() the unicode-escape encoding and now I no longer get the error above and my data now gets encrypted like it should when I use unicode characters for things.

I am unsure as to what affect this may have but all I know is that it fixed my problem and allowed me to encrypt my data using keys/passwords and salts with unicode characters.

Everything has ran smoothly and just fine so far that I can tell when encrypting my data.

therealOri avatar Jul 07 '22 20:07 therealOri

Ignore the typo in

Changed utf-8 to unicode-escape in the bcrypt function and allowed "password" and "salt" in the PKDF2 function to be compatible with unicode character encoding.

I meant PBKDF2

therealOri avatar Jul 07 '22 20:07 therealOri

The PR does break existing code:

>>> KDF.bcrypt("ü".encode('unicode-escape'), 4, b"\x00"*16)  # Equivalent to behavior after the PR
b'$2a$04$......................4AbT0Bs92Ij2TEOAbNTgoIoX5hGK5MW'
>>> KDF.bcrypt("ü", 4, b"\x00"*16)  # Without the PR applied
b'$2a$04$......................tPR34n3qpSzDqGFTSHyuqN.cvsW8RuG'

>>> KDF.PBKDF2("ü".encode('unicode-escape'), "ü".encode('unicode-escape'))  # Equivalent to behavior after the PR
b'`\x1c\xb4;$\x02,)T\x8c\x1fzO\x7f\x94('
>>> KDF.PBKDF2("ü", "ü")  # Without the PR applied
b'{\x19\xda\xb4)\x0fh\x9f\xc0\xee\xd1\xbf\x19\xec\x0cv'

Nevertheless, the implicit use of Latin-1/ISO 8859-1 should be documented for PBKDF2, e.g. "Unicode strings will be encoded as ISO 8859-1 (also known as Latin-1), this does not allow any characters with codepoints > 255."

Varbin avatar Jul 28 '22 16:07 Varbin

Closing as this would indeed break backward compatibility. The recommendation of @Varbin is included as 3a8efd0ef1c03ca2b4e9207a3eba3c5a6fcdfcd7.

Legrandin avatar Dec 11 '22 22:12 Legrandin