cloud-platform [WIP] Internal Cluster Service for accessing Prometheus & Alert Manager endpoints

[WIP] Internal Cluster Service for accessing Prometheus & Alert Manager endpoints

Open sj-williams opened this issue 1 year ago • 2 comments

======AIMING TO DO THIS ONE THIS SPRINT (SPRINT 5)=======

Background

We have had a user support ticket asking whether its possible to have pod access to the following endpoints:

https://prometheus.live.cloud-platform.service.justice.gov.uk/api/v1/alerts https://alertmanager.live.cloud-platform.service.justice.gov.uk/api/v2/alerts

Issue here: https://github.com/ministryofjustice/cloud-platform/issues/5074

Whilst it is possible to hit these endpoints internally, we don't want to open a route between user namespaces and monitoring namespace for obvious reasons.

A solution to this might look like:

A proxy (nginx might suffice on its own) in a dedicated namespace, possibly with an authentication layer, that filters upstream GET requests to the internal services for above endpoints in monitoring, and a single NetworkPolicy for this dedicated namespace.

This ticket is to look at implementing such a service.

Proposed user journey

Approach

Which part of the user docs does this impact

Communicate changes

[ ] post for #cloud-platform-update
[ ] Weeknotes item
[ ] Show the Thing/P&A All Hands/User CoP
[ ] Announcements channel

Questions / Assumptions

Definition of done

[ ] readme has been updated
[ ] user docs have been updated
[ ] another team member has reviewed
[ ] smoke tests are green
[ ] prepare demo for the team

Reference

How to write good user stories

Jan 17 '24 16:01 sj-williams

This is a work in progress. 80% ish there, a couple things left to do:

Run POC in dev environment with user
Consider additional authentication layer (API key / basic auth), although it may be that networkpolicy restricting access to monitoring namespace from singe specific pod in service namespace may be enough?

Feb 17 '24 10:02 sj-williams

Discussed in Sprint Planing 21/3 and will rollover into next sprint.

Feb 21 '24 11:02 Matt-Alinosn

cloud-platform cloud-platform copied to clipboard

[WIP] Internal Cluster Service for accessing Prometheus & Alert Manager endpoints

Background

Proposed user journey

Approach

Which part of the user docs does this impact

Communicate changes

Questions / Assumptions

Definition of done

Reference

cloud-platform
cloud-platform copied to clipboard