k8s.io
k8s.io copied to clipboard
AWS Billing visibility
see previously: https://github.com/kubernetes/k8s.io/issues/4141#issuecomment-1272628985
TLDR:
We should have some idea where we're at on consuming the AWS resources, right now I think not even SIG K8s Infra chairs and/or Steering know this. On GCP we have a public report in SIG K8s Infra with a detailed breakdown so we have some idea where we're at with that.
/assign @hh @Riaankl @calebamiles /area billing /committee steering
/unassign @calebamiles /assign @BobyMCbobs
/unassign @calebamiles /assign @BobyMCbobs
Oops. 😅
Amazon CloudWatch publishes billing-related metrics in the AWS/Billing
metrics namespace. We can do things with those and maybe aggregate them with other cost data.
I have worked out to how to do this and I have a functional demo to show at this week's meeting.
TL;DR:
- S3 Metrics:
- Cloudwatch metrics are a bit meh (mix of daily and minute metrics) :(
- Gathering some of the metrics has some cost implications. I'm not sure exactly how much.
- For Cost:
- We need to enable CURs and export it to S3
- Use Glue to write to Athena
- Crunch the data in Athena
- Visualise it in Grafana
- I wouldn't bother using CloudWatch to read billing metrics at all. It is stale and doesn't have as much info as the CURs
On a sidenote, we should probably start looking at using ArgoCD to deploy k8s manifests instead. Knative uses ArgoCD to deploy Kubernetes manifests, have a look at https://github.com/knative/test-infra/issues/3363 for details.
Notes:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-request-metrics-bucket.html
- https://docs.aws.amazon.com/cur/latest/userguide/cur-query-athena.html
- https://docs.aws.amazon.com/cur/latest/userguide/cur-create.html Don't forget the resource IDs
- There is a bug in Grafana which is blocking cross account access. This is needed as Athena lives in the Master Payer account and the S3 metrics live in the registry.k8s.io account https://github.com/grafana/grafana/issues/57664
- https://github.com/borg-land/k8s-demo/blob/dev/monitoring/values.yaml#L3


Great job @upodroid !!
I built this sometime a go, hopefully it can be put to some good use https://github.com/kubernetes/k8s.io/tree/main/infra/aws/aws-costexplorer-export
It currently exports all the Cost Explorer reports from the root level account into a bucket and pushes it over to a BigQuery dataset. Will need to modify it to filter it to just Kubernetes related accounts.
/priority important-longterm /milestone v1.27
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
We currenlty have these: GCP Data Studio Report console.cloud.google.com/billing AWS Cost
Working with Kubecost to consolidate information
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/lifecycle frozen
as a stop gap we publish high level reports on overall spend to the sig slack weekly and in the biweekly meetings we look at the AWS web console via someone who has access.