promo-tools
promo-tools copied to clipboard
Auditor not querying GCR
What happened:
The GCP project: k8s-artifacts-prod
does not appear to show it's GCR API being requested. This is unexpected behavior, as the auditor should be continuously verifying new GCR state changes. When it sees a new image that is not inside k/k8s.io, it consults the GCR to verify this image. However, since there hasn't been ANY queries to GCR in 30+ days, the auditor must not be running correctly.
Related to k/k8s.io issue: #2364
What you expected to happen:
The API service should show some amount of requests for GCR.
How to reproduce it (as minimally and precisely as possible):
N/A
Anything else we need to know?:
Environment:
- Cloud provider or hardware configuration: GCP cc: @listx @amwat @justaugustus @kubernetes-sigs/release-engineering
Thinking about this again, I think it's because the auditor queries the staging GCRs, not prod. See https://github.com/kubernetes-sigs/k8s-container-image-promoter/blob/bed8661305e1e325d0dd22b61e0cea2967b0f0e4/legacy/audit/auditor.go#L261-L266. The subproject
here means the staging repo that belongs to the subproject, not production.
@listx appears to be correct! Here we see two functions that make HTTP requests to GCR. I don't believe this would be monitored by the Container Registry API. This would explain why the auditor is not triggering these metrics. This PR #356 looks to gain visibility on this topic by logging total number of GCR requests per 10 min period. This is how GCR measures it's quotas, therefore we can notice how close the auditor is getting to the upper limit of 50,000 requests per 10min within the logs.
Deployment of this PR: #383 should help us gain insight into the auditor's network usage.
According the the logs, the auditor is periodically restarting.
It doesn't appear to show any error or failing behavior which makes a strong case that Cloud Run is throttling it. Auto-scaling is a feature of Cloud Run which may be giving us this unwanted behavior. If we take a look at the metrics, the auditor seems to be using so little CPU and receiving no requests, forcing it to remain idle.
Although the docs suggest specifying --min-instances
to remain "permanently available" our efforts to do so, in our deploy script, does not seem to have that effect.
We have successfully captured an instance where the Auditor has exceeded quota!
This was during the audit of mock/kube-scheduler
:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.