Steve Williams

Results 72 issues of Steve Williams

## Background During testing of kubernetes 1.27 upgrade process, we have uncovered issues with terraform apply at components level regarding read of k8s manifest objects. This has been tracked down...

Infrastructure
eks-1.27-upgrade

## Background There is a new major release of fluent-bit available, we are currently on version `2.2.1` https://fluentbit.io/announcements/v3.0.0/ https://fluentbit.io/announcements/v3.0.1/ Associated Helm release is `v0.46.1`: https://github.com/fluent/helm-charts/blob/fluent-bit-0.46.1/charts/fluent-bit/Chart.yaml#L8-L9 ## Approach Fluent don't publish...

## Background We are currently running `v1.24.0` of `aws-ebs-csi-driver`, utilising Helm chart release [2.24.0](https://github.com/ministryofjustice/cloud-platform-terraform-eks-csi/blob/main/main.tf#L7) Although there is no definitive documentation around k8s version compatibility, we can check supported EKS versions...

## Background We have a [user requesting](https://mojdt.slack.com/archives/C57UPMZLY/p1712926957089949) the ability to configure alerts based on S3 bucket access (GET/POST requests, success/fail/auth etc). Out of the box, CloudWatch provides Storage metrics for...

support-team

## Background We are still experiencing Prometheus restarts (4/12 & 5/12) triggering high priority alerts and prompting queries from CP users in the ask channel. Following the approach of previous[...

operations-driven-engineering

======AIMING TO DO THIS ONE THIS SPRINT (SPRINT 5)======= ## Background We have had a user support ticket asking whether its possible to have pod access to the following endpoints:...

Environments

## Background We are now fairly routinely adding new pipelines and jobs to Concourse for automating chore-like tasks (cordon drains, custom deletes etc etc) It would probably be nice to...

## Background We currently have our ingress controller replica count set to 12 for both `default` and `modsec` flavours of our deployments. We should review this count and consider whether...

Infrastructure

## Background Find out in what scenarios a pod can increase underlying node CPU usage. RShiny problems were tracked down to liveness probe hitting an endpoint regularly that opened a...

Environments

## Background We have seen that some modules are drifting behind with things like terraform providers, for example the RDS module: https://github.com/ministryofjustice/cloud-platform-terraform-rds-instance/blob/main/versions.tf#L6 Compared to our infra code & modules which...