pycryptodome
pycryptodome copied to clipboard
Allowing unicode character compatibility for PBKDF2
Changed utf-8 to unicode-escape in the bcrypt function and allowed "password" and "salt" in the PBKDF2 function to be compatible with unicode character encoding.
I did this because the passwords/keys and salts I am generating use unicode characters and while trying to encrypt data (using oCrypt0r) it kept spitting out the following error.
UnicodeEncodeError: 'latin-1' codec can't encode character '\u12a0' in position 0: ordinal not in range(256)
It was after I decided to take a look at the PBKDF2 function in the KDF.py file to see that the tobytes()
function wasn't using an encoding and that the bcrypt()
function had a tobytes()
function with utf-8
encoding. So I decided to give the tobytes()
function in bcrypt()
and the 2 others in PBKDF2()
the unicode-escape
encoding and now I no longer get the error above and my data now gets encrypted like it should when I use unicode characters for things.
I am unsure as to what affect this may have but all I know is that it fixed my problem and allowed me to encrypt my data using keys/passwords and salts with unicode characters.
Everything has ran smoothly and just fine so far that I can tell when encrypting my data.
Ignore the typo in
Changed utf-8 to unicode-escape in the bcrypt function and allowed "password" and "salt" in the PKDF2 function to be compatible with unicode character encoding.
I meant PBKDF2
The PR does break existing code:
>>> KDF.bcrypt("ü".encode('unicode-escape'), 4, b"\x00"*16) # Equivalent to behavior after the PR
b'$2a$04$......................4AbT0Bs92Ij2TEOAbNTgoIoX5hGK5MW'
>>> KDF.bcrypt("ü", 4, b"\x00"*16) # Without the PR applied
b'$2a$04$......................tPR34n3qpSzDqGFTSHyuqN.cvsW8RuG'
>>> KDF.PBKDF2("ü".encode('unicode-escape'), "ü".encode('unicode-escape')) # Equivalent to behavior after the PR
b'`\x1c\xb4;$\x02,)T\x8c\x1fzO\x7f\x94('
>>> KDF.PBKDF2("ü", "ü") # Without the PR applied
b'{\x19\xda\xb4)\x0fh\x9f\xc0\xee\xd1\xbf\x19\xec\x0cv'
Nevertheless, the implicit use of Latin-1/ISO 8859-1 should be documented for PBKDF2, e.g. "Unicode strings will be encoded as ISO 8859-1 (also known as Latin-1), this does not allow any characters with codepoints > 255."
Closing as this would indeed break backward compatibility. The recommendation of @Varbin is included as 3a8efd0ef1c03ca2b4e9207a3eba3c5a6fcdfcd7.