pycryptodome icon indicating copy to clipboard operation
pycryptodome copied to clipboard

Incorrect Padding

Open therealOri opened this issue 3 years ago • 1 comments

I have been having an issue with encrypting and decrypting my data with unicode characters lately.

Traceback (most recent call last):
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 691, in <module>
    change_creds()
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 401, in change_creds
    vne_pwords = stringE2(z)
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 117, in stringE2
    vne_key = PBKDF2(vne_ev_password, vne_salt, dkLen=32)
  File "/home/ori/Desktop/Projects/PassGen_dev/pnewgENV/lib/python3.10/site-packages/Crypto/Protocol/KDF.py", line 140, in PBKDF2
    password = tobytes(password)
  File "/home/ori/Desktop/Projects/PassGen_dev/pnewgENV/lib/python3.10/site-packages/Crypto/Util/py3compat.py", line 130, in tobytes
    return s.encode(encoding)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 6: ordinal not in range(256)

using utf-8 for tobytes() functions in the KDF.py file.

I made a pull request and made my own patch to fix this error.




However, recently I've been getting an error telling me the padding is incorrect. And I want to say that ANY and all credentials being used for decrypting have not been changed or altered in anyway and work just fine when I revert my patch and use utf-8 in the KDF.py file. ONLY after I use my patch or "unicode-escape" for the tobytes() function does the padding change somehow and throws the error. EVEN THOUGH NOTHING HAS CHANGED.

Traceback (most recent call last):
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 691, in <module>
    change_creds()
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 391, in change_creds
    pwords = stringD_lst(y)
  File "/home/ori/Desktop/Projects/PassGen_dev/passgen.py", line 106, in stringD_lst
    raise Exception(strd_e) from None
Exception: The provided credentials do not match what was was used to encrypt the data...
Error: Padding is incorrect.

using unicode-escape for tobytes() functions in the KDF.py file.




I have no idea if this is a bug or what is happening to cause this. But this shouldn't be a thing or happening. Why in the world would changing utf-8 to unicode-escape (which encompasses utf-8 anyways) break the padding and make it not match. I also want to point out again that my Encryption key and salt are exactly the same and have not been altered in anyway. And work just fine when using utf-8 encoding. UNLESS I started the encryption USING unicode-escape, then there are no issues and it'll decrypt just fine.

Basically the following is what I'm doing and what is giving me headaches and the errors above. Any help would be great.

#Changing THIS
password = tobytes(password)
salt = tobytes(salt)

#To THIS
password = tobytes(password, 'unicode-escape')
salt = tobytes(salt, 'unicode-escape')

If you would like to know the project in question and get a better look at my code: PassGen is where you'll find the latest code.

Attached is the code file of what I'm doing currently and hasn't been pushed yet. passgen.tar.gz

therealOri avatar Jul 21 '22 00:07 therealOri

@therealOri Do you mind creating a minimal example that shows this example (e.g. a piece of code only including the code required to reproduce the error and nothing more)?

After skimming your script, it seems to me you want to encrypt data with a password. The usual order of operations is:

  1. Key derivation
    1. You have the password as a string.
    2. Convert this to bytes with an explicit encoding: e.g. password_bytes = password.encode('utf-8')
    3. Generate or load the salt. Convert with an explicit encoding to bytes.
    4. Apply PBKDF2 over the salt and password (both bytes!).
  2. Encryption
    1. You have the derived key and input (plain) text as bytes.
    2. Encode the plain text with an explicit encoding: plain_bytes = plain.encode('utf-8') and add padding to the bytes.
    3. Generate an IV/nonce
    4. Encrypt the padded bytes.
    5. Add a MAC over the encrypted bytes + IV/nonce if not using an authenticated mode like GCM or EAX; If you do not use an authenticated mode, it is highly recommended to use the different keys for authentication and encryption.
    6. Store IV/nonce, ciphertext, MAC
  3. Decryption
    1. You have the derived key and IV/nonce, ciphertext and MAC
    2. Verify the MAC or authentication tag
    3. Decrypt the data
    4. Remove the padding
    5. Decode the decrypted bytes to a string with explicit encoding, e.g. pain_text = deciphered_unpadded_plain_bytes.decode('utf-8')

The store/read between encryption step 6 and decryption step 1 is assumed to return the same data as stored.

If this scheme is consequently applied (this does not apply to PyCryptodome only, but is language, library and protocol independent) PyCryptodome works as intended. Of course, the encoding must be the same for encryption and decryption. Using a different encoding will lead to different keys or data, and therefore will lead to issues.

The "utf-8" in the above examples can be freely replaced with any other encoding known to Python, as long this replacement is used consequently (even the 'unicode-replace' one, although this is rather uncommon). If I have to guess: Somewhere in your code, this is not happening consequently. Therefore, the data is decrypted with the wrong key leading to garbage results. Garbage usually does not have a valid padding, leading to the above error.

Or you want to decrypt data created before your patch, therefor the derived key is not the equal one.

Varbin avatar Jul 28 '22 16:07 Varbin