promscale icon indicating copy to clipboard operation
promscale copied to clipboard

Allow configuring HA replica and cluster label names

Open onprem opened this issue 3 years ago • 6 comments

Description

Fixes #1578 .

This PR adds two new flags to allow configuring the replica and cluster label names. Currently both of the label names are not configurable but are required for HA to work.

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

  • [ ] CHANGELOG entry for user-facing changes
  • [ ] Updated the relevant documentation

onprem avatar Sep 07 '22 13:09 onprem

@arajkumar thanks for catching that. Yes, this will trouble the telemetry.

I guess we need to propagate the user provided value.

We should not do that. This will be breaking, if multiple Promscale are running. Also, we should remember that the reason for this change is limited to testing/benchmarking purpose.

I think we need to discuss this on joint standup. Maybe instead of __, we just use _, like _replica_, which should be allowed. Please correct me @onprem if I am wrong.

harkishen avatar Sep 12 '22 08:09 harkishen

How about if we translate any configured value into __cluster__ and translate it back to configured value when queried? Just shooting in the dark :)

arajkumar avatar Sep 14 '22 13:09 arajkumar

@onprem The 0.15.0 release will happen this week. If you wish to include this PR in the release. Can you follow up with the reviewers accordingly?

VineethReddy02 avatar Sep 19 '22 17:09 VineethReddy02

@VineethReddy02 we need to make changes to the telemetry engine to propagate the telemetry information.

@arajkumar how about if we do not touch the telemetry implementation. In that way, metrics_ha_cluster_count works only when someone is using __cluster__ label. Otherwise, we do not get the "count" thingy.

For knowing if ha feature is used or not, we add another telemetry metrics_ha_enabled which is just a boolean. It is true if -metrics.high-availability flag is applied, otherwise false. This information will be filled as we start promscale, in our very first write of telemetry. WDYT?

Note: This means once ha is started, it will always remain true in telemetry. But that said, I don't think users will stop using ha if they started to use it.

harkishen avatar Sep 21 '22 08:09 harkishen

@Harkishen-Singh in terms of effort what would you say to make this configurable?

Is it something like the following?

  • Add the entry in the configuration for the label
  • Pass the configurable label to the telemetry engine
  • Pass down the label to the extension. Maybe for this part we could create a new version of the functions in the extention that accepts the label, that way we could make it backwards compatible.

Maybe something like:

_ps_catalog.promscale_sql_telemetry_with_cluster_label(label TEXT)

_ps_catalog.promscale_sql_telemetry() would just return promscale_sql_telemetry_with_custom_cluster_label("cluster")

BTW I haven't looked into the telemetry engine and don't know the implications of a change like this.

alejandrodnm avatar Sep 21 '22 10:09 alejandrodnm

_ps_catalog.promscale_sql_telemetry() would just return promscale_sql_telemetry_with_custom_cluster_label("cluster")

Do you mean _ps_catalog.promscale_sql_telemetry() will call the .._with_custom_cluster?

In anyways, we will need a change in extension for this.

On the telemetry engine side, we need to pass down the cluster label, and the engine will specifically call another func (we will need to create this), that will look for the new cluster label name and update in the metadata table. This will be 30-40 lines of code at max.

What we should do right now? Since telemetry is not a priority at the moment and most users will use the default values in the flags, I suggest we leave the telemetry specific to __cluster__ only and make an issue for this. We will fix this issue in the next iteration of telemetry improvements, which will happen with the doc that Vineeth created for new telemetry requirements.

Thoughts? @arajkumar @alejandrodnm

harkishen avatar Sep 23 '22 10:09 harkishen

ping @onprem

harkishen avatar Oct 11 '22 09:10 harkishen

Heya @Harkishen-Singh, sorry for the delay on this, finally found some time to work on this. I have moved the configuration into ha package as you suggested, as well as rebased the PR to resolve some merge conflicts. This PR should be ready for another round of reviews.

onprem avatar Oct 25 '22 07:10 onprem

I will also suggest to squash the commits before merging the PR @onprem.

harkishen avatar Nov 09 '22 13:11 harkishen

@onprem we should also add an entry to the changelog before merging this PR.

harkishen avatar Dec 09 '22 09:12 harkishen