k8s.io
k8s.io copied to clipboard
registry.k8s.io S3 bucket auditing
Currently there's GCP auditing of events and resource usage in https://github.com/kubernetes/k8s.io/tree/main/audit. I'd would be good to have auditing of the registry.k8s.io S3 buckets, writing to a repo when new objects are uploaded.
The auditing could be as simple as
REGIONS=(
ap-northeast-1
ap-south-1
ap-southeast-1
eu-central-1
eu-west-1
us-east-1
us-east-2
us-west-1
us-west-2
)
for REGION in "${REGIONS[@]}"; do
aws s3api list-objects --bucket "prod-registry-k8s-io-$REGION" --no-sign-request --output json > "bucket-$REGION.json"
done
in a CI job, making a PR into the k8s.io repo.
related: https://github.com/kubernetes/k8s.io/issues/1834, https://github.com/kubernetes/k8s.io/issues/3623
/area k8s.gcr.io /area registry.k8s.io /sig k8s-infra
@BobyMCbobs: The label(s) area/registry.k8s.io cannot be applied, because the repository doesn't have them.
In response to this:
/area k8s.gcr.io /area registry.k8s.io /sig k8s-infra
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'm playing around with this idea over here
https://github.com/ii/registry-k8s-io-s3-object-audit
Any thoughts around including https://github.com/ii/registry-k8s-io-s3-object-audit into the kubernetes/k8s.io repo? Might be pretty easy to turn it into a ProwJob.
cc @ameukam @upodroid
+1 to wrapping that github action as a prowjob and committing the diff to k/k8s.io
Related: https://github.com/kubernetes/k8s.io/pull/4223
With the full caveat that I have been away while most of the discussion and work on registry.k8s.io has happened, I don't understand why this is necessary.
Our audit scripts today do not list the contents of GCS buckets, GCR repos, or GAR repos. I'm not exactly clear why we feel compelled to do so for S3.
There needs to be another way than just to push the audit into the main repo if this is to be done .. because the audit files will be something like 90% of the repo
So I think what we're trying to audit is that the mirrors accurately reflect the canonical location. So we want to know if somehow the mirrors have (1) extra files or (2) corrupted files. We also want to know if the mirrors are missing files, but this could be a natural transitory state during initial publishing of new images.
I propose that we follow the image promoter pattern: we have a tool that can promote things based on the manifests in this repo. That tool should be able to run in "dry-run" mode, and should warn us about the problems above (and we can decide whether "missing file" is indeed a problem)
What I don't know is how we currently populate the AWS bucket prod-registry-k8s-io-us-east-2. It looks like from that bucket we set up AWS bucket-to-bucket replication to populate the others, but I don't know how we get into the first AWS bucket.
I would have expected to see it here (for example) and I do not. @BenTheElder do you know?
So I think what we're trying to audit is that the mirrors accurately reflect the canonical location.
That's one thing, but it's not really necessary here, since we only serve these files based on selecting them from the manifests which we serve from the registry directly.
We need to be auditing for public visibility:
- who has access
- did anyone other than the robot make any changes
- did any deletions occur from any account (should not happen even from the robot)
It would additionally be useful to scan for:
- do the layers match the layers in the source registry
However, we're already continuously syncing these from the source of truth in CI, and if that fails that should alert, so the highest priority should start with exposing who has access.
I propose that we follow the image promoter pattern: we have a tool that can promote things based on the manifests in this repo. That tool should be able to run in "dry-run" mode, and should warn us about the problems above (and we can decide whether "missing file" is indeed a problem)
The image promoter needs to be able to populate the layers first.
I would have expected to see it here (for example) and I do not. @BenTheElder do you know?
It's run via syncing the GCS bucket in GCR, using a CI job in test-infra. In the future it will be done via OCI to S3 with an async job, because the image promoter is already not running smoothly, and we can afford to continue populating this async.
There needs to be another way than just to push the audit into the main repo if this is to be done .. because the audit files will be something like 90% of the repo
We might be talking about different things ...?
See: https://github.com/kubernetes/k8s.io/tree/main/audit
We need to be auditing for public visibility: ... who has access
For that, I think we do want a script like the ones we have today, where we dump ACLs on the buckets into a file in this repo.
I think the debate comes on whether we include the files themselves, it seems the consensus is "not by this audit mechanism". But we do want an audit mechanism, likely based around the syncer alerting on unexpected differences.
We need to be auditing for public visibility: ... who has access
For that, I think we do want a script like the ones we have today, where we dump ACLs on the buckets into a file in this repo.
In regards to dumping bucket ACLs for objects in the registry.k8s.io buckets, I have a PR here https://github.com/kubernetes/k8s.io/pull/4223 based on the code snippet in the description of this issue.
I think the debate comes on whether we include the files themselves, it seems the consensus is "not by this audit mechanism". But we do want an audit mechanism, likely based around the syncer alerting on unexpected differences.
It appears to me from this comment, like we do want ACLs being dumped like I've proposed but not the way I've proposed. I'm trying to figure out what the thought is around it, is there a way I can help make the request happen?
It appears to me from this comment, like we do want ACLs being dumped like I've proposed but not the way I've proposed. I'm trying to figure out what the thought is around it, is there a way I can help make the request happen?
I think we can start to get the ACLs for the buckets dumped using get-bucket-acl and probably also dump the policy attached to each bucket with get-bucket-policy.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
/area registry.k8s.io
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.