synthetic-monitoring-app
synthetic-monitoring-app copied to clipboard
Provision the plugin with datasource UIDs instead of names
Renaming a stack in Cloud results in the SM plugin losing track of which prometheus and loki datasources it should be talking to. This happens because prom/loki datasources get renamed when the stack gets renames, but the provisioning that informs the plugin doesn't have the new names. The plugin identifies datasources by name, so things stop working. We should look into updating the plugin provisioning to use UIDs instead (if possible), it will be more stable and the SM plugin will continue to work without interruption across renames.
Thanks for creating this, @rdubrock! Some background from the HG side:
- Currently in production the
grafanaName
in the json data isn't changed for SM, but I'm working on changing the datasources names now as the datasources are renamed issue here. This will still require an uninstall & reinstall, though, to get SM up and running again. - HG names datasources as
grafanacloud-<slug>-logs
but keeps the UIDs consistent across all instances so that we are able to key off of those (they will look likegrafanacloud-logs
orgrafanacloud-prom
). - HG will provision the datasources via a yaml file, with the UID specified, so the UID will exist from the start (in cloud at least).
Also, in case it's helpful to reference, the k8s integration uses UID of the prometheus datasource in order to grab the scrape interval from that config.
Adding some thoughts after a bit of looking:
- We have to support looking up the datasources by name AND uid. Since the SM plugin can be installed anywhere we aren't guaranteed to have provisioning or the consistent cloud uids
- We can use UIDs instead of names for generating the SM dashboards
- We're going to need to do some sort of migration for existing tenants to move them from name => uid in cloud
Since this is merged, can this issue be closed or is there missing work?
We got a report that renaming stacks will break SM until the datasource is deleted, so there is still an issue here.
This should be fixed with version 1.11.0