sops icon indicating copy to clipboard operation
sops copied to clipboard

Support using comments to select parts to encrypt

Open mitar opened this issue 2 years ago • 1 comments

This PR adds support to annotate comments with a string (e.g., sops:enc) which can then be matched with a regex. If it matches, the corresponding value (the one which follows the comment) is encrypted while other values are not. (There is also the opposite regex available, to select those values which should not be encrypted.)

This enables the YAML file to have the same structure encrypted and decrypted, without having to add suffixes or manage complex regexes to match keys. See #543 for more discussion.

mitar avatar Dec 19 '21 23:12 mitar

Rebased this as well.

mitar avatar Mar 04 '22 10:03 mitar

Rebased to latest develop.

mitar avatar Sep 30 '22 11:09 mitar

I rebased to latest main.

mitar avatar Jul 17 '23 17:07 mitar

Hey guys, we are interested in the merging of this PR, it solves a long-standing issue of the --encrypted-suffix which requires us to perform sed -i 's/encrypted_suffix//g' on all our files after sops decryption.

The linked issue, #543, is 4 years old, and this change elegantly solves the issue.

Is there something I can do to help on this?

Gui13 avatar Aug 16 '23 18:08 Gui13

This did not make the cut for v3.8.0, as this version will contain many changes already (especially in the area of rewriting all key source implementations to their latest API clients). Making it risky to add much more on top of it. It is however scheduled to be looked at for v3.9.0.

hiddeco avatar Aug 16 '23 20:08 hiddeco

This PR now depends on https://github.com/getsops/sops/pull/1300.

mitar avatar Sep 22 '23 10:09 mitar

@felixfontein: I addressed all review comments. I moved unrelated changes to a separate commit which is also part of https://github.com/getsops/sops/pull/1300, so once that is merged it will not be in this PR anymore.

I also rebased to the current main branch.

mitar avatar Sep 22 '23 19:09 mitar

I rebased after #1300 was merged.

mitar avatar Sep 25 '23 09:09 mitar

Thank you for the review.

This is caused because ENC matches the encrypted comment.

Nice catch. I didn't really expect for people to use such simple regexes. Personally I use sops:enc which does not seem to have the issue you are describing (it cannot appear in encoded string).

Maybe we should just document this issue and suggest a regex (e.g., sops:enc)? So documentation could be something like: "do not pick a regex which can match encrypted values" (with link to the format of encrypted values)?

But it is not really nice for the security tool like this one to be able to be misconfigured.

Potentially shouldBeEncrypted should know whether it is decrypting, and if it is, and the regex for the comment matches, it should first check whether the value looks like an encrypted string.

I think we should do a check like:

  • Try to match the comment with encryptedValueRegexp (matching the structure of encrypted values) and with given regexp.
  • If only one of them match, good, we know what to do.
  • If both match, then we transform the comment by removing content matched by the encryptedValueRegexp and try to match given regexp again.
  • If it still matches, good, we know what to do.
  • If it does not match, we abort and complain that the regexp is too broad.

I think this is slightly better in sense that it informs the user about misconfiguration and that they should fix the regexp.

And at the same time, when it encounters such comments (that match the regex) during encryption, it should reject the file.

So during encryption, every time we encrypt any comment, we try to match the result with the given regexp, and if it does match the regexp, we abort and complain that the regexp is too broad. If I understand it correctly, I think this is fine.

So we should just make sure to help the user pick a good regexp by detecting too broad (or should we way "too simple"?) regexp and complain in such case.

mitar avatar Sep 25 '23 22:09 mitar

@felixfontein Do you agree with my proposal above?

mitar avatar Oct 11 '23 16:10 mitar

Hmm, reading all this again after some time mainly shows me that this is something we have to be very, very careful with.

One other solution that came to my mind when looking at this again: during encryption, once a comment is encrypted, check whether the encrypted comment matches UnencryptedCommentRegex or EncryptedCommentRegex (when provided). If any of them matches, reject the whole file.

felixfontein avatar Nov 05 '23 14:11 felixfontein

One other solution that came to my mind when looking at this again: during encryption, once a comment is encrypted, check whether the encrypted comment matches UnencryptedCommentRegex or EncryptedCommentRegex (when provided). If any of them matches, reject the whole file.

Hm, isn't this the same as we have been discussing already? I wrote:

So during encryption, every time we encrypt any comment, we try to match the result with the given regexp, and if it does match the regexp, we abort and complain that the regexp is too broad. If I understand it correctly, I think this is fine.

So I think this is great. We seems to be in agreement what to do when encrypting. Just check if the result matches the regexp and abort if it does.

But I think the question is what to do when decrypting. So I can imagine a scenario where you encrypted only few lines with a comment like sops:enc. Great. But then in the encrypted file you decide to change sops:enc to enc comment. And then you try to decrypt that with an updated CLI argument. What should happen? This is why I proposed:

I think we should do a check like:

  • Try to match the comment with encryptedValueRegexp (matching the structure of encrypted values) and with given regexp.
  • If only one of them match, good, we know what to do.
  • If both match, then we transform the comment by removing content matched by the encryptedValueRegexp and try to match given regexp again.
  • If it still matches, good, we know what to do.
  • If it does not match, we abort and complain that the regexp is too broad.

I think this is slightly better in sense that it informs the user about misconfiguration and that they should fix the regexp.

Maybe a simpler way would be to store (and MAC) used regexp to encrypt into config section of the file. And when decrypting only use the regexp from the config section. So if you want to change the regexp/comment, you have to decrypt and re-encrypt? In general it might be nice user experience to store all those CLI config switches into config section so that you do not have to specify them again when decrypting (and match them exactly, and know what was used when encrypting).

What do you think?

mitar avatar Nov 07 '23 22:11 mitar

One other solution that came to my mind when looking at this again: during encryption, once a comment is encrypted, check whether the encrypted comment matches UnencryptedCommentRegex or EncryptedCommentRegex (when provided). If any of them matches, reject the whole file.

Hm, isn't this the same as we have been discussing already? I wrote:

So during encryption, every time we encrypt any comment, we try to match the result with the given regexp, and if it does match the regexp, we abort and complain that the regexp is too broad. If I understand it correctly, I think this is fine.

So I think this is great. We seems to be in agreement what to do when encrypting. Just check if the result matches the regexp and abort if it does.

Yes, this part we agree upon.

But I think the question is what to do when decrypting. So I can imagine a scenario where you encrypted only few lines with a comment like sops:enc. Great. But then in the encrypted file you decide to change sops:enc to enc comment. And then you try to decrypt that with an updated CLI argument. What should happen?

IMO: if you modify the encrypted file so it won't decrypt anymore, it's your own fault if it fails.

This is why I proposed:

I think we should do a check like:

  • Try to match the comment with encryptedValueRegexp (matching the structure of encrypted values) and with given regexp.
  • If only one of them match, good, we know what to do.
  • If both match,

In case both match, I would simply error out. The user did something wrong (assuming we don't have a bug) and they need to fix it manually. TBH I would simply check the encryptedValueRegexp regular expression, and if it matches, assume that the comment is encrypted (and not even check the other regexp).

then we transform the comment by removing content matched by the encryptedValueRegexp and try to match given regexp again.

  • If it still matches, good, we know what to do.
  • If it does not match, we abort and complain that the regexp is too broad.

I think this is slightly better in sense that it informs the user about misconfiguration and that they should fix the regexp.

I would avoid that. It makes decryption less efficient (two regular expressions to test instead of potentially only one per comment), and increases complexity for a situation that should not arise if users do not modify sops encrypted files by hand.

Maybe a simpler way would be to store (and MAC) used regexp to encrypt into config section of the file.

Right now the MAC only checks the message, and not the metadata. I would avoid extending it. We could add another MAC which only covers metadata, but also that is something that should not happen in this PR.

And when decrypting only use the regexp from the config section.

That's what should happen anyway.

So if you want to change the regexp/comment, you have to decrypt and re-encrypt? In general it might be nice user experience to store all those CLI config switches into config section so that you do not have to specify them again when decrypting (and match them exactly, and know what was used when encrypting).

If you have to specify any switch again when decrypting, it is a bug that needs to be fixed ASAP. The only options that should affect decryption are general things like --output, --output-type, --ignore-mac.

felixfontein avatar Dec 19 '23 13:12 felixfontein

I rebased this PR to do some (unrelated) work on top of it; the resulting commit is contained in #1387.

felixfontein avatar Dec 26 '23 09:12 felixfontein

Thanks. I will look into this again after holidays.

mitar avatar Dec 26 '23 13:12 mitar

BTW, if you are currently working on this and you are mentally in this logic and willing, feel free to wrap up with this PR yourself. If it is easier for you.

mitar avatar Dec 26 '23 14:12 mitar

Not right now, but I might try to invest more time in this during the next days.

felixfontein avatar Dec 26 '23 15:12 felixfontein

I'll work on it in #1392.

felixfontein avatar Dec 27 '23 16:12 felixfontein

This is continued in #1392.

mitar avatar Jan 04 '24 12:01 mitar