promscale
promscale copied to clipboard
Allow configuring HA replica and cluster label names
Description
Fixes #1578 .
This PR adds two new flags to allow configuring the replica and cluster label names. Currently both of the label names are not configurable but are required for HA to work.
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR:
- [ ] CHANGELOG entry for user-facing changes
- [ ] Updated the relevant documentation
@arajkumar thanks for catching that. Yes, this will trouble the telemetry.
I guess we need to propagate the user provided value.
We should not do that. This will be breaking, if multiple Promscale are running. Also, we should remember that the reason for this change is limited to testing/benchmarking purpose.
I think we need to discuss this on joint standup. Maybe instead of __, we just use _, like _replica_, which should be allowed. Please correct me @onprem if I am wrong.
How about if we translate any configured value into __cluster__ and translate it back to configured value when queried? Just shooting in the dark :)
@onprem The 0.15.0 release will happen this week. If you wish to include this PR in the release. Can you follow up with the reviewers accordingly?
@VineethReddy02 we need to make changes to the telemetry engine to propagate the telemetry information.
@arajkumar how about if we do not touch the telemetry implementation. In that way, metrics_ha_cluster_count works only when someone is using __cluster__ label. Otherwise, we do not get the "count" thingy.
For knowing if ha feature is used or not, we add another telemetry metrics_ha_enabled which is just a boolean. It is true if -metrics.high-availability flag is applied, otherwise false. This information will be filled as we start promscale, in our very first write of telemetry. WDYT?
Note: This means once ha is started, it will always remain true in telemetry. But that said, I don't think users will stop using ha if they started to use it.
@Harkishen-Singh in terms of effort what would you say to make this configurable?
Is it something like the following?
- Add the entry in the configuration for the label
- Pass the configurable label to the telemetry engine
- Pass down the label to the extension. Maybe for this part we could create a new version of the functions in the extention that accepts the label, that way we could make it backwards compatible.
Maybe something like:
_ps_catalog.promscale_sql_telemetry_with_cluster_label(label TEXT)
_ps_catalog.promscale_sql_telemetry() would just return promscale_sql_telemetry_with_custom_cluster_label("cluster")
BTW I haven't looked into the telemetry engine and don't know the implications of a change like this.
_ps_catalog.promscale_sql_telemetry() would just return promscale_sql_telemetry_with_custom_cluster_label("cluster")
Do you mean _ps_catalog.promscale_sql_telemetry() will call the .._with_custom_cluster?
In anyways, we will need a change in extension for this.
On the telemetry engine side, we need to pass down the cluster label, and the engine will specifically call another func (we will need to create this), that will look for the new cluster label name and update in the metadata table. This will be 30-40 lines of code at max.
What we should do right now?
Since telemetry is not a priority at the moment and most users will use the default values in the flags, I suggest we leave the telemetry specific to __cluster__ only and make an issue for this. We will fix this issue in the next iteration of telemetry improvements, which will happen with the doc that Vineeth created for new telemetry requirements.
Thoughts? @arajkumar @alejandrodnm
ping @onprem
Heya @Harkishen-Singh, sorry for the delay on this, finally found some time to work on this. I have moved the configuration into ha package as you suggested, as well as rebased the PR to resolve some merge conflicts. This PR should be ready for another round of reviews.
I will also suggest to squash the commits before merging the PR @onprem.
@onprem we should also add an entry to the changelog before merging this PR.