mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Helm: unify NGINX and gateway values sections

Open dimitarvdimitrov opened this issue 2 years ago • 8 comments

What this PR does

Unifies the GEM gateway and NGINX deployment under the same section in the values file - gateway. The idea is to remove the nginx-specific values in release 7.0.0.

In doing so it also introduces the following changes:

  • nginx and the gateway now use the same Service
  • nginx and the gateway now use the same Deployment
  • nginx and the gateway now use the same Ingress

These three together make it seamless to migrate from Mimir to GEM.

By unifying the two the GEM gateway now has autoscaling and support for OpenShift Route.

The PR also add a migration guide from nginx to gateway.

Which issue(s) this PR fixes or relates to

Fixes #2203

Checklist

  • [x] Tests updated
  • [x] Documentation added
  • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

dimitarvdimitrov avatar Oct 10 '22 17:10 dimitarvdimitrov

Unblocking solely from a docs perspective; this is massive and I am happy to check things doc-related in a subsequent PR if needed.

Thanks for the quick comments!

I'm not sure about @krajorama 's availability, but I imagine that this PR will not be ready for merge this week in terms of helm changes. So feel free to leave as much feedback as you have, @osg-grafana

dimitarvdimitrov avatar Oct 10 '22 17:10 dimitarvdimitrov

@osg-grafana i will try to rework this PR with Patrick's suggestion in this comment https://github.com/grafana/mimir/issues/2203#issuecomment-1274015213

I'll move this PR to draft until I'm done. Can you please not review the docs until then?

dimitarvdimitrov avatar Oct 11 '22 07:10 dimitarvdimitrov

this is again ready for review

dimitarvdimitrov avatar Oct 11 '22 17:10 dimitarvdimitrov

I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss

what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too

dimitarvdimitrov avatar Oct 12 '22 12:10 dimitarvdimitrov

I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss

what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too

What are the user’s | operator’s needs with regard to zero downtime? I am also reluctant without direct input from those impacted.

osg-grafana avatar Oct 12 '22 14:10 osg-grafana

I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss

what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too

What are the user’s | operator’s needs with regard to zero downtime? I am also reluctant without direct input from those impacted.

We've discussed this irl, and the conclusion was that the current procedure is good enough. Just have to make sure that NOTES.txt and metamonitoring works as expected in case someone uses those nameOverrides

krajorama avatar Oct 13 '22 13:10 krajorama

I pushed updated notes in https://github.com/grafana/mimir/pull/3181/commits/69ae750a78e2c55662026c226e89f7dff76a17e5

dimitarvdimitrov avatar Oct 13 '22 15:10 dimitarvdimitrov

@dimitarvdimitrov One more thing, we'll need to update the NOTES.txt to include the gateway ingress appropriately.

https://github.com/grafana/mimir/blob/dimitar/helm-mimir-gem-migration/operations/helm/charts/mimir-distributed/templates/NOTES.txt

Right now it says this even though I have the gateway ingress enabled with OSS mimir:

Welcome to Grafana Mimir!

Remote write endpoints for Prometheus or Grafana Agent:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
  http://mimir-gateway.mimir.svc:80/api/v1/push

Read address, Grafana data source (Prometheus) URL:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
  http://mimir-gateway.mimir.svc:80/prometheus

Logiraptor avatar Oct 16 '22 05:10 Logiraptor

@Logiraptor, @krajorama will be on PTO for the next two weeks and is busy this week. Do you have capacity to review this PR in the coming two weeks? Alternatively, we can wait for Krajo to come back to get a review. I don't think this is that critical.

dimitarvdimitrov avatar Oct 25 '22 08:10 dimitarvdimitrov