mimir
mimir copied to clipboard
Helm: unify NGINX and gateway values sections
What this PR does
Unifies the GEM gateway and NGINX deployment under the same section in the values file - gateway
. The idea is to remove the nginx-specific values in release 7.0.0.
In doing so it also introduces the following changes:
- nginx and the gateway now use the same Service
- nginx and the gateway now use the same Deployment
- nginx and the gateway now use the same Ingress
These three together make it seamless to migrate from Mimir to GEM.
By unifying the two the GEM gateway now has autoscaling and support for OpenShift Route.
The PR also add a migration guide from nginx to gateway.
Which issue(s) this PR fixes or relates to
Fixes #2203
Checklist
- [x] Tests updated
- [x] Documentation added
- [x]
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
Unblocking solely from a docs perspective; this is massive and I am happy to check things doc-related in a subsequent PR if needed.
Thanks for the quick comments!
I'm not sure about @krajorama 's availability, but I imagine that this PR will not be ready for merge this week in terms of helm changes. So feel free to leave as much feedback as you have, @osg-grafana
@osg-grafana i will try to rework this PR with Patrick's suggestion in this comment https://github.com/grafana/mimir/issues/2203#issuecomment-1274015213
I'll move this PR to draft until I'm done. Can you please not review the docs until then?
this is again ready for review
I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss
what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too
I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss
what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too
What are the user’s | operator’s needs with regard to zero downtime? I am also reluctant without direct input from those impacted.
I think the migration is a bit complicated, can we trade a little downtime somehow to make it simpler? nginx->gateway? let's discuss
what do you have in mind? I'm happy to make some steps optional and say that they are there only for zero-downtime. But I'm reluctant to not provide a zero-downtime migration. We can discuss during our 1:1 too
What are the user’s | operator’s needs with regard to zero downtime? I am also reluctant without direct input from those impacted.
We've discussed this irl, and the conclusion was that the current procedure is good enough. Just have to make sure that NOTES.txt and metamonitoring works as expected in case someone uses those nameOverrides
I pushed updated notes in https://github.com/grafana/mimir/pull/3181/commits/69ae750a78e2c55662026c226e89f7dff76a17e5
@dimitarvdimitrov One more thing, we'll need to update the NOTES.txt to include the gateway ingress appropriately.
https://github.com/grafana/mimir/blob/dimitar/helm-mimir-gem-migration/operations/helm/charts/mimir-distributed/templates/NOTES.txt
Right now it says this even though I have the gateway ingress enabled with OSS mimir:
Welcome to Grafana Mimir!
Remote write endpoints for Prometheus or Grafana Agent:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
http://mimir-gateway.mimir.svc:80/api/v1/push
Read address, Grafana data source (Prometheus) URL:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
http://mimir-gateway.mimir.svc:80/prometheus
@Logiraptor, @krajorama will be on PTO for the next two weeks and is busy this week. Do you have capacity to review this PR in the coming two weeks? Alternatively, we can wait for Krajo to come back to get a review. I don't think this is that critical.