bool values are not encrypted adequately
I realise there's not many people out there using bools as secrets, but...
input:
example_booleans:
- true
- false
- true
- false
- true
output:
example_booleans:
- ENC[AES256_GCM,data:g/Obfg==, ... ,type:bool]
- ENC[AES256_GCM,data:NMMyM14=, ... ,type:bool]
- ENC[AES256_GCM,data:g/Obfg==, ... ,type:bool]
- ENC[AES256_GCM,data:NMMyM14=, ... ,type:bool]
- ENC[AES256_GCM,data:g/Obfg==, ... ,type:bool]
It's obvious by the amount of = padding in the data field which values are true and which are false.
Applying padding to the end of what's being encrypted (maybe just a null then repetition of input data so it's deterministic) up to e.g. the next 16 byte boundary would help conceal the exact length of the data being encrypted (which is a dead giveaway for bools, and also not ideal for other types).
Actually I might be over-complicating that, null padding is likely good enough.
Also, I notice from above that repeated occurrences of the same value in the same array seems to result in the same encrypted data. This issue also afflicts strings, e.g.
input:
some_identical_secrets:
- ghewkjgewhjgew
- blbooefef
- ghewkjgewhjgew
output:
some_identical_secrets:
- ENC[AES256_GCM,data:fDHnMO6reuPTRWT4lZE=, ... ,type:str]
- ENC[AES256_GCM,data:6ePlJfxUkPEN, ... type:str]
- ENC[AES256_GCM,data:fDHnMO6reuPTRWT4lZE=, ... ,type:str]```
so I wonder whether the position in the array could also be used as input to the encryption to avoid this.
A related issue is #815. Solving this requires a new protocol format; I've created #1726 to track a "wishlist" for it.
Also, I notice from above that repeated occurrences of the same value in the same array seems to result in the same encrypted data. This issue also afflicts strings, e.g.
input:
some_identical_secrets: - ghewkjgewhjgew - blbooefef - ghewkjgewhjgewoutput:
some_identical_secrets: - ENC[AES256_GCM,data:fDHnMO6reuPTRWT4lZE=, ... ,type:str] - ENC[AES256_GCM,data:6ePlJfxUkPEN, ... type:str] - ENC[AES256_GCM,data:fDHnMO6reuPTRWT4lZE=, ... ,type:str]```so I wonder whether the position in the array could also be used as input to the encryption to avoid this.
Including the index would break one important feature of SOPS: if you modify a file (like by adding a list entry), the diff of the encrypted files only shows that change (besides some changes to the metadata). In the above case, if you add a new list item in the middle, and if the list index would be used, all entries after it would also change in the encrypted file.
The only reason right now that two identical values encrypt to the same result is this feature. Figuring out after editing a file which values have just been moved and shouldn't be re-encrypted, and which values have been changed or added and need to be encrypted, is quite a hard problem. Right now this is done by a simple cache, which is initialized wiht the previous content of the file: every value maps to its encryption. If a new value should be encrypted, the cache is checked. If the value appears in there, the encrypted value will be taken from there. If not, it will be encrypted (with a random IV / salt) and the cache will be updated.
Avoiding that the same value encrypts to the same ciphertext (like by using a different IV / salt in every case) without destroying the minimal diff property of encrypted files is quite hard to do. For example, consider a file:
foo:
- bar
- baz
- bam
which is edited to
foo:
- something else
boo:
- baf
- baz
- bam
Ensuring that the encryptions of baz and bam don't change, so that the diff of the encrypted files corresponds to the diff of the unencrypted files (up to changes in the metadata, of course), is quite hard.