k8s.io AWS Billing visibility

see previously: https://github.com/kubernetes/k8s.io/issues/4141#issuecomment-1272628985

TLDR:

We should have some idea where we're at on consuming the AWS resources, right now I think not even SIG K8s Infra chairs and/or Steering know this. On GCP we have a public report in SIG K8s Infra with a detailed breakdown so we have some idea where we're at with that.

Oct 12 '22 20:10 BenTheElder

/assign @hh @Riaankl @calebamiles /area billing /committee steering

Oct 12 '22 21:10 ameukam

/unassign @calebamiles /assign @BobyMCbobs

Oct 13 '22 22:10 riaankleinhans

/unassign @calebamiles /assign @BobyMCbobs

Oops. 😅

Oct 14 '22 02:10 ameukam

Amazon CloudWatch publishes billing-related metrics in the AWS/Billing metrics namespace. We can do things with those and maybe aggregate them with other cost data.

Oct 29 '22 19:10 sftim

I have worked out to how to do this and I have a functional demo to show at this week's meeting.

TL;DR:

S3 Metrics:
- Cloudwatch metrics are a bit meh (mix of daily and minute metrics) :(
- Gathering some of the metrics has some cost implications. I'm not sure exactly how much.
For Cost:
- We need to enable CURs and export it to S3
- Use Glue to write to Athena
- Crunch the data in Athena
- Visualise it in Grafana
- I wouldn't bother using CloudWatch to read billing metrics at all. It is stale and doesn't have as much info as the CURs

On a sidenote, we should probably start looking at using ArgoCD to deploy k8s manifests instead. Knative uses ArgoCD to deploy Kubernetes manifests, have a look at https://github.com/knative/test-infra/issues/3363 for details.

Notes:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html
https://docs.aws.amazon.com/cur/latest/userguide/cur-query-athena.html
https://docs.aws.amazon.com/cur/latest/userguide/cur-create.html Don't forget the resource IDs
There is a bug in Grafana which is blocking cross account access. This is needed as Athena lives in the Master Payer account and the S3 metrics live in the registry.k8s.io account https://github.com/grafana/grafana/issues/57664
https://github.com/borg-land/k8s-demo/blob/dev/monitoring/values.yaml#L3

Oct 31 '22 15:10 upodroid

chrome_87zaebBE8z chrome_uZrCcaonJx

Oct 31 '22 15:10 upodroid

Great job @upodroid !!

Oct 31 '22 17:10 riaankleinhans

I built this sometime a go, hopefully it can be put to some good use https://github.com/kubernetes/k8s.io/tree/main/infra/aws/aws-costexplorer-export

It currently exports all the Cost Explorer reports from the root level account into a bucket and pushes it over to a BigQuery dataset. Will need to modify it to filter it to just Kubernetes related accounts.

Nov 08 '22 20:11 BobyMCbobs

/priority important-longterm /milestone v1.27

Nov 24 '22 19:11 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 22 '23 19:02 k8s-triage-robot

/remove-lifecycle stale

Feb 22 '23 20:02 ameukam

We currenlty have these: GCP Data Studio Report console.cloud.google.com/billing AWS Cost

Working with Kubecost to consolidate information

Feb 22 '23 22:02 riaankleinhans

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 23 '23 23:05 k8s-triage-robot

/remove-lifecycle stale

May 24 '23 16:05 ameukam

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 21 '24 04:01 k8s-triage-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 22 '24 17:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 22 '24 17:05 k8s-triage-robot

/lifecycle frozen

as a stop gap we publish high level reports on overall spend to the sig slack weekly and in the biweekly meetings we look at the AWS web console via someone who has access.

May 22 '24 19:05 BenTheElder

k8s.io k8s.io copied to clipboard

AWS Billing visibility

k8s.io
k8s.io copied to clipboard