k8s.io registry.k8s.io: S3 buckets metrics

trafficstars

registry.k8s.io will use S3 buckets to distribute container blobs. we should be to get metrics generated by the network. AWS provides those metrics with AWS Cloudwatch.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html#s3-cloudwatch-metrics

We should focus the scope of the metrics to the production buckets.

/milestone v1.26 /area artifacts /priority important-longterm

Aug 25 '22 19:08 ameukam

@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl

Aug 25 '22 19:08 ameukam

@sftim pssible to do it using cross-account replication ?

Aug 26 '22 00:08 ameukam

@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl

@ameukam, I believe some job may need to be set up to scrape the metrics out and places them else where. Otherwise it might be the case for bucket replication through rclone. Looking more into it

Aug 30 '22 02:08 BobyMCbobs

S3 has continuous, managed replication - can we use that?

Aug 30 '22 08:08 sftim

Does the community have any visibility into:

AWS Budget and spend rate
Which things are costing the most
Traffic served by s3

I know we privately got traffic data for the GCR stuff, and we have the public data studio billing report that breaks down the usage in GCP, but AFAICT we have nothing for AWS.

Oct 09 '22 21:10 BenTheElder

AWS Budget and spend rate Which things are costing the most

They should be treated in separated issues since it's about overall cost of the AWS organization (including other projects) vs metrics of a specific service.

Oct 09 '22 21:10 ameukam

I want to understand if we even have those to fall back on, considering we don't have much else on AWS.

If the answer is no, then yes, those need to be filed as issues, and IMHO are very important long term, moreso than this one.

Oct 09 '22 21:10 BenTheElder

I've tried approaching this several times, currently having a hard time with CloudWatch metrics.

Oct 10 '22 19:10 BobyMCbobs

filed https://github.com/kubernetes/k8s.io/issues/4348 for the budget visibility tangent

Oct 12 '22 20:10 BenTheElder

I'm not sure what we're using for metrics scraping and handling now but perhaps the cloudwatch_exporter for prometheus is an option. Here's an example for S3 https://github.com/prometheus/cloudwatch_exporter/blob/master/examples/S3.yml

Oct 29 '22 18:10 josh-ferrell

Have a look at https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Counting404Responses.html

We can do the almost the same thing for 2xx responses, using mostly managed APIs. If we want to copy the counter data into Prometheus we can do that too.

To generate those logs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html Log delivery is best effort and is not real time. There are ways to make sure to log every single request to a bucket, but folks I think wouldn't like 'em.

Oct 29 '22 19:10 sftim

https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html covers turning on CloudWatch metrics for a bucket

Oct 29 '22 19:10 sftim

/unassign @BobyMCbobs

/milestone v1.27

Jan 19 '23 14:01 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 19 '23 15:04 k8s-triage-robot

With the built in dashboards we have some limited visibility into e.g. bandwidth usage and number of objects, but so far we can't tell things like egress vs in-region other than by correlating with the bills.

The bills are the thing we ultimately care about, but there's room for more insight here still /remove-lifecycle stale

Apr 19 '23 15:04 BenTheElder

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 18 '23 15:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jan 19 '24 23:01 k8s-triage-robot

/remove-lifecycle rotten /area infra/aws /milestone v1.30

Feb 02 '24 15:02 ameukam

/milestone v1.31

Apr 18 '24 07:04 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 17 '24 07:07 k8s-triage-robot

k8s.io k8s.io copied to clipboard

registry.k8s.io: S3 buckets metrics

k8s.io
k8s.io copied to clipboard