k8s.io
k8s.io copied to clipboard
registry.k8s.io: S3 buckets metrics
registry.k8s.io will use S3 buckets to distribute container blobs. we should be to get metrics generated by the network. AWS provides those metrics with AWS Cloudwatch.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html#s3-cloudwatch-metrics
We should focus the scope of the metrics to the production buckets.
/milestone v1.26 /area artifacts /priority important-longterm
@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl
@sftim pssible to do it using cross-account replication ?
@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl
@ameukam, I believe some job may need to be set up to scrape the metrics out and places them else where. Otherwise it might be the case for bucket replication through rclone. Looking more into it
S3 has continuous, managed replication - can we use that?
Does the community have any visibility into:
- AWS Budget and spend rate
- Which things are costing the most
- Traffic served by s3
I know we privately got traffic data for the GCR stuff, and we have the public data studio billing report that breaks down the usage in GCP, but AFAICT we have nothing for AWS.
AWS Budget and spend rate Which things are costing the most
They should be treated in separated issues since it's about overall cost of the AWS organization (including other projects) vs metrics of a specific service.
I want to understand if we even have those to fall back on, considering we don't have much else on AWS.
If the answer is no, then yes, those need to be filed as issues, and IMHO are very important long term, moreso than this one.
I've tried approaching this several times, currently having a hard time with CloudWatch metrics.
filed https://github.com/kubernetes/k8s.io/issues/4348 for the budget visibility tangent
I'm not sure what we're using for metrics scraping and handling now but perhaps the cloudwatch_exporter for prometheus is an option. Here's an example for S3 https://github.com/prometheus/cloudwatch_exporter/blob/master/examples/S3.yml
Have a look at https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Counting404Responses.html
We can do the almost the same thing for 2xx responses, using mostly managed APIs. If we want to copy the counter data into Prometheus we can do that too.
To generate those logs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html Log delivery is best effort and is not real time. There are ways to make sure to log every single request to a bucket, but folks I think wouldn't like 'em.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html covers turning on CloudWatch metrics for a bucket
/unassign @BobyMCbobs
/milestone v1.27
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
With the built in dashboards we have some limited visibility into e.g. bandwidth usage and number of objects, but so far we can't tell things like egress vs in-region other than by correlating with the bills.
The bills are the thing we ultimately care about, but there's room for more insight here still /remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten /area infra/aws /milestone v1.30
/milestone v1.31
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale