k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

registry.k8s.io S3 bucket auditing

Open BobyMCbobs opened this issue 3 years ago • 8 comments

Currently there's GCP auditing of events and resource usage in https://github.com/kubernetes/k8s.io/tree/main/audit. I'd would be good to have auditing of the registry.k8s.io S3 buckets, writing to a repo when new objects are uploaded.

The auditing could be as simple as

REGIONS=(
    ap-northeast-1
    ap-south-1
    ap-southeast-1

    eu-central-1
    eu-west-1

    us-east-1
    us-east-2
    us-west-1
    us-west-2
)

for REGION in "${REGIONS[@]}"; do
  aws s3api list-objects --bucket "prod-registry-k8s-io-$REGION" --no-sign-request --output json > "bucket-$REGION.json"
done

in a CI job, making a PR into the k8s.io repo.

related: https://github.com/kubernetes/k8s.io/issues/1834, https://github.com/kubernetes/k8s.io/issues/3623

BobyMCbobs avatar Jul 20 '22 02:07 BobyMCbobs

/area k8s.gcr.io /area registry.k8s.io /sig k8s-infra

BobyMCbobs avatar Jul 20 '22 20:07 BobyMCbobs

@BobyMCbobs: The label(s) area/registry.k8s.io cannot be applied, because the repository doesn't have them.

In response to this:

/area k8s.gcr.io /area registry.k8s.io /sig k8s-infra

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jul 20 '22 20:07 k8s-ci-robot

I'm playing around with this idea over here

https://github.com/ii/registry-k8s-io-s3-object-audit

BobyMCbobs avatar Jul 20 '22 23:07 BobyMCbobs

Any thoughts around including https://github.com/ii/registry-k8s-io-s3-object-audit into the kubernetes/k8s.io repo? Might be pretty easy to turn it into a ProwJob.

cc @ameukam @upodroid

BobyMCbobs avatar Sep 15 '22 20:09 BobyMCbobs

+1 to wrapping that github action as a prowjob and committing the diff to k/k8s.io

upodroid avatar Sep 15 '22 23:09 upodroid

Related: https://github.com/kubernetes/k8s.io/pull/4223

BobyMCbobs avatar Sep 16 '22 03:09 BobyMCbobs

With the full caveat that I have been away while most of the discussion and work on registry.k8s.io has happened, I don't understand why this is necessary.

Our audit scripts today do not list the contents of GCS buckets, GCR repos, or GAR repos. I'm not exactly clear why we feel compelled to do so for S3.

spiffxp avatar Oct 07 '22 18:10 spiffxp

There needs to be another way than just to push the audit into the main repo if this is to be done .. because the audit files will be something like 90% of the repo

asim-reza avatar Oct 08 '22 01:10 asim-reza

So I think what we're trying to audit is that the mirrors accurately reflect the canonical location. So we want to know if somehow the mirrors have (1) extra files or (2) corrupted files. We also want to know if the mirrors are missing files, but this could be a natural transitory state during initial publishing of new images.

I propose that we follow the image promoter pattern: we have a tool that can promote things based on the manifests in this repo. That tool should be able to run in "dry-run" mode, and should warn us about the problems above (and we can decide whether "missing file" is indeed a problem)

What I don't know is how we currently populate the AWS bucket prod-registry-k8s-io-us-east-2. It looks like from that bucket we set up AWS bucket-to-bucket replication to populate the others, but I don't know how we get into the first AWS bucket. I would have expected to see it here (for example) and I do not. @BenTheElder do you know?

justinsb avatar Nov 11 '22 16:11 justinsb

So I think what we're trying to audit is that the mirrors accurately reflect the canonical location.

That's one thing, but it's not really necessary here, since we only serve these files based on selecting them from the manifests which we serve from the registry directly.

We need to be auditing for public visibility:

  • who has access
  • did anyone other than the robot make any changes
  • did any deletions occur from any account (should not happen even from the robot)

It would additionally be useful to scan for:

  • do the layers match the layers in the source registry

However, we're already continuously syncing these from the source of truth in CI, and if that fails that should alert, so the highest priority should start with exposing who has access.

I propose that we follow the image promoter pattern: we have a tool that can promote things based on the manifests in this repo. That tool should be able to run in "dry-run" mode, and should warn us about the problems above (and we can decide whether "missing file" is indeed a problem)

The image promoter needs to be able to populate the layers first.

I would have expected to see it here (for example) and I do not. @BenTheElder do you know?

It's run via syncing the GCS bucket in GCR, using a CI job in test-infra. In the future it will be done via OCI to S3 with an async job, because the image promoter is already not running smoothly, and we can afford to continue populating this async.

BenTheElder avatar Nov 11 '22 19:11 BenTheElder

There needs to be another way than just to push the audit into the main repo if this is to be done .. because the audit files will be something like 90% of the repo

We might be talking about different things ...?

See: https://github.com/kubernetes/k8s.io/tree/main/audit

BenTheElder avatar Nov 11 '22 19:11 BenTheElder

We need to be auditing for public visibility: ... who has access

For that, I think we do want a script like the ones we have today, where we dump ACLs on the buckets into a file in this repo.

I think the debate comes on whether we include the files themselves, it seems the consensus is "not by this audit mechanism". But we do want an audit mechanism, likely based around the syncer alerting on unexpected differences.

justinsb avatar Nov 11 '22 19:11 justinsb

We need to be auditing for public visibility: ... who has access

For that, I think we do want a script like the ones we have today, where we dump ACLs on the buckets into a file in this repo.

In regards to dumping bucket ACLs for objects in the registry.k8s.io buckets, I have a PR here https://github.com/kubernetes/k8s.io/pull/4223 based on the code snippet in the description of this issue.

I think the debate comes on whether we include the files themselves, it seems the consensus is "not by this audit mechanism". But we do want an audit mechanism, likely based around the syncer alerting on unexpected differences.

It appears to me from this comment, like we do want ACLs being dumped like I've proposed but not the way I've proposed. I'm trying to figure out what the thought is around it, is there a way I can help make the request happen?

BobyMCbobs avatar Nov 13 '22 19:11 BobyMCbobs

It appears to me from this comment, like we do want ACLs being dumped like I've proposed but not the way I've proposed. I'm trying to figure out what the thought is around it, is there a way I can help make the request happen?

I think we can start to get the ACLs for the buckets dumped using get-bucket-acl and probably also dump the policy attached to each bucket with get-bucket-policy.

ameukam avatar Nov 16 '22 10:11 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 14 '23 10:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 16 '23 11:03 k8s-triage-robot

/remove-lifecycle rotten

ameukam avatar Mar 16 '23 11:03 ameukam

/area registry.k8s.io

ameukam avatar Mar 16 '23 11:03 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 14 '23 11:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 14 '23 12:07 k8s-triage-robot

/remove-lifecycle rotten

ameukam avatar Jul 14 '23 20:07 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 24 '24 13:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 23 '24 13:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 24 '24 13:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 24 '24 13:03 k8s-ci-robot