k8s.io icon indicating copy to clipboard operation
k8s.io copied to clipboard

AWS Billing visibility

Open BenTheElder opened this issue 2 years ago • 18 comments

see previously: https://github.com/kubernetes/k8s.io/issues/4141#issuecomment-1272628985

TLDR:

We should have some idea where we're at on consuming the AWS resources, right now I think not even SIG K8s Infra chairs and/or Steering know this. On GCP we have a public report in SIG K8s Infra with a detailed breakdown so we have some idea where we're at with that.

BenTheElder avatar Oct 12 '22 20:10 BenTheElder

/assign @hh @Riaankl @calebamiles /area billing /committee steering

ameukam avatar Oct 12 '22 21:10 ameukam

/unassign @calebamiles /assign @BobyMCbobs

riaankleinhans avatar Oct 13 '22 22:10 riaankleinhans

/unassign @calebamiles /assign @BobyMCbobs

Oops. 😅

ameukam avatar Oct 14 '22 02:10 ameukam

Amazon CloudWatch publishes billing-related metrics in the AWS/Billing metrics namespace. We can do things with those and maybe aggregate them with other cost data.

sftim avatar Oct 29 '22 19:10 sftim

I have worked out to how to do this and I have a functional demo to show at this week's meeting.

TL;DR:

  • S3 Metrics:
    • Cloudwatch metrics are a bit meh (mix of daily and minute metrics) :(
    • Gathering some of the metrics has some cost implications. I'm not sure exactly how much.
  • For Cost:
    • We need to enable CURs and export it to S3
    • Use Glue to write to Athena
    • Crunch the data in Athena
    • Visualise it in Grafana
    • I wouldn't bother using CloudWatch to read billing metrics at all. It is stale and doesn't have as much info as the CURs

On a sidenote, we should probably start looking at using ArgoCD to deploy k8s manifests instead. Knative uses ArgoCD to deploy Kubernetes manifests, have a look at https://github.com/knative/test-infra/issues/3363 for details.

Notes:

  • https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html
  • https://docs.aws.amazon.com/cur/latest/userguide/cur-query-athena.html
  • https://docs.aws.amazon.com/cur/latest/userguide/cur-create.html Don't forget the resource IDs
  • There is a bug in Grafana which is blocking cross account access. This is needed as Athena lives in the Master Payer account and the S3 metrics live in the registry.k8s.io account https://github.com/grafana/grafana/issues/57664
  • https://github.com/borg-land/k8s-demo/blob/dev/monitoring/values.yaml#L3

upodroid avatar Oct 31 '22 15:10 upodroid

chrome_87zaebBE8z chrome_uZrCcaonJx

upodroid avatar Oct 31 '22 15:10 upodroid

Great job @upodroid !!

riaankleinhans avatar Oct 31 '22 17:10 riaankleinhans

I built this sometime a go, hopefully it can be put to some good use https://github.com/kubernetes/k8s.io/tree/main/infra/aws/aws-costexplorer-export

It currently exports all the Cost Explorer reports from the root level account into a bucket and pushes it over to a BigQuery dataset. Will need to modify it to filter it to just Kubernetes related accounts.

BobyMCbobs avatar Nov 08 '22 20:11 BobyMCbobs

/priority important-longterm /milestone v1.27

ameukam avatar Nov 24 '22 19:11 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 22 '23 19:02 k8s-triage-robot

/remove-lifecycle stale

ameukam avatar Feb 22 '23 20:02 ameukam

We currenlty have these: GCP Data Studio Report console.cloud.google.com/billing AWS Cost

Working with Kubecost to consolidate information

riaankleinhans avatar Feb 22 '23 22:02 riaankleinhans

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 23 '23 23:05 k8s-triage-robot

/remove-lifecycle stale

ameukam avatar May 24 '23 16:05 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 21 '24 04:01 k8s-triage-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 22 '24 17:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 22 '24 17:05 k8s-triage-robot

/lifecycle frozen

as a stop gap we publish high level reports on overall spend to the sig slack weekly and in the biweekly meetings we look at the AWS web console via someone who has access.

BenTheElder avatar May 22 '24 19:05 BenTheElder