scylla-operator icon indicating copy to clipboard operation
scylla-operator copied to clipboard

Add integrity information to must-gather

Open tnozicka opened this issue 11 months ago • 14 comments

Description of your changes: This PR add integrity information to must-gather dump. This will help to identify whether files have been removed or the collection failed.

Which issue is resolved by this Pull Request: Resolves #1847

tnozicka avatar Mar 19 '24 13:03 tnozicka

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tnozicka

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

I only had a brief look - wouldn't a merkle tree be a standard approach for this use case?

It's valid but more for cryptography.

The approach here allows you to easily identify the places with changes with respect to the particular resource and couples well with linux CLI tools. Say, to verify a folder manually, you can run find scylla-operator-must-gather/namespaces/scylla-operator/serviceaccounts -mindepth 1 -exec cat {} \; | sha512sum - and you have the explicit count so you at least know how many files are missing.

tnozicka avatar Mar 21 '24 08:03 tnozicka

@tnozicka: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-release-script-latest 68232b504f289b214df48a950faad6eab6c6d620 link true /test e2e-gke-release-script-latest

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

I only had a brief look - wouldn't a merkle tree be a standard approach for this use case?

It's valid but more for cryptography.

The approach here allows you to easily identify the places with changes with respect to the particular resource and couples well with linux CLI tools. Say, to verify a folder manually, you can run find scylla-operator-must-gather/namespaces/scylla-operator/serviceaccounts -mindepth 1 -exec cat {} ; | sha512sum -

would this approach also allow you to check if a given file was originally a part of the archive?

and you have the explicit count so you at least know how many files are missing.

is that actually a valuable information if you don't know which ones are missing?

Since you already put the work into this I don't feel strong about using the merkle tree (altough there probable are many existing implementations), I'm struggling to understand what is the purpose of this exactly if we don't have the capability of signing it (correct me if I'm wrong here)? The PR description says This will help to identify whether files have been removed or the collection failed, so it's working on an assumption that someone purposefully tampered with the archive, so they are already "malicious" (or non-compliant?) - what's to stop them from tampering with the integrity information as well? Would you be able to tell the modified archive apart from an archive in which some manifests just weren't collected?

rzetelskik avatar Apr 17 '24 16:04 rzetelskik

would this approach also allow you to check if a given file was originally a part of the archive?

Transiently. But anything can be messed up with, it's just the amount of work that needs to be put in. Keep in mind this is meant primarily for detecting removed files, not to remove some and add fake files. At a point where the folder doesn't match we can just ask for a new collection.

We could extent this to go at the file level.

is that actually a valuable information if you don't know which ones are missing?

The most pressing concern is temper detection, after that point I'd usually just not proceed further.

Since you already put the work into this I don't feel strong about using the merkle tree (altough there probable are many existing implementations)

Using a merkle tree it is not easy or possible to use standard linux tooling for verification and you have to add extra commands. It's a valid option though.

what's to stop them from tampering with the integrity information as well?

nothing, with any approach we take. they are tempering with the archive with the intent to limit the data to what they think is not related (usually wrongly), not to be intentfully malicious.

tnozicka avatar Apr 18 '24 07:04 tnozicka

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

is that actually a valuable information if you don't know which ones are missing?

The most pressing concern is temper detection, after that point I'd usually just not proceed further.

Are you saying that we wouldn't investigate a customer issue if they removed any file, even if we weren't sure it was necessarily related to Scylla?

what's to stop them from tampering with the integrity information as well?

nothing, with any approach we take. they are tempering with the archive with the intent to limit the data to what they think is not related (usually wrongly), not to be intentfully malicious.

I mean, I get the point, but at this point you'd have to assume they deleted a related file and assume a somewhat hostile approach. I'm not sure how that'd turn out in practice.

Before we go into reviewing the implementation, @zimnx what are your thoughts on this?

rzetelskik avatar Apr 26 '24 13:04 rzetelskik