k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

registry.k8s.io: S3 buckets metrics

Open ameukam opened this issue 3 years ago • 19 comments
trafficstars

registry.k8s.io will use S3 buckets to distribute container blobs. we should be to get metrics generated by the network. AWS provides those metrics with AWS Cloudwatch.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html#s3-cloudwatch-metrics

We should focus the scope of the metrics to the production buckets.

/milestone v1.26 /area artifacts /priority important-longterm

ameukam avatar Aug 25 '22 19:08 ameukam

@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl

ameukam avatar Aug 25 '22 19:08 ameukam

@sftim pssible to do it using cross-account replication ?

ameukam avatar Aug 26 '22 00:08 ameukam

@BobyMCbobs is there a way to access to those metrics outside of the AWS account of the production buckets ? cc @Riaankl

@ameukam, I believe some job may need to be set up to scrape the metrics out and places them else where. Otherwise it might be the case for bucket replication through rclone. Looking more into it

BobyMCbobs avatar Aug 30 '22 02:08 BobyMCbobs

S3 has continuous, managed replication - can we use that?

sftim avatar Aug 30 '22 08:08 sftim

Does the community have any visibility into:

  • AWS Budget and spend rate
  • Which things are costing the most
  • Traffic served by s3

I know we privately got traffic data for the GCR stuff, and we have the public data studio billing report that breaks down the usage in GCP, but AFAICT we have nothing for AWS.

BenTheElder avatar Oct 09 '22 21:10 BenTheElder

AWS Budget and spend rate Which things are costing the most

They should be treated in separated issues since it's about overall cost of the AWS organization (including other projects) vs metrics of a specific service.

ameukam avatar Oct 09 '22 21:10 ameukam

I want to understand if we even have those to fall back on, considering we don't have much else on AWS.

If the answer is no, then yes, those need to be filed as issues, and IMHO are very important long term, moreso than this one.

BenTheElder avatar Oct 09 '22 21:10 BenTheElder

I've tried approaching this several times, currently having a hard time with CloudWatch metrics.

BobyMCbobs avatar Oct 10 '22 19:10 BobyMCbobs

filed https://github.com/kubernetes/k8s.io/issues/4348 for the budget visibility tangent

BenTheElder avatar Oct 12 '22 20:10 BenTheElder

I'm not sure what we're using for metrics scraping and handling now but perhaps the cloudwatch_exporter for prometheus is an option. Here's an example for S3 https://github.com/prometheus/cloudwatch_exporter/blob/master/examples/S3.yml

josh-ferrell avatar Oct 29 '22 18:10 josh-ferrell

Have a look at https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Counting404Responses.html

We can do the almost the same thing for 2xx responses, using mostly managed APIs. If we want to copy the counter data into Prometheus we can do that too.

To generate those logs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html Log delivery is best effort and is not real time. There are ways to make sure to log every single request to a bucket, but folks I think wouldn't like 'em.

sftim avatar Oct 29 '22 19:10 sftim

https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html covers turning on CloudWatch metrics for a bucket

sftim avatar Oct 29 '22 19:10 sftim

/unassign @BobyMCbobs

/milestone v1.27

ameukam avatar Jan 19 '23 14:01 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 19 '23 15:04 k8s-triage-robot

With the built in dashboards we have some limited visibility into e.g. bandwidth usage and number of objects, but so far we can't tell things like egress vs in-region other than by correlating with the bills.

The bills are the thing we ultimately care about, but there's room for more insight here still /remove-lifecycle stale

BenTheElder avatar Apr 19 '23 15:04 BenTheElder

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 18 '23 15:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 19 '24 23:01 k8s-triage-robot

/remove-lifecycle rotten /area infra/aws /milestone v1.30

ameukam avatar Feb 02 '24 15:02 ameukam

/milestone v1.31

ameukam avatar Apr 18 '24 07:04 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 17 '24 07:07 k8s-triage-robot