config tenant override works even when it's not set
Describe the bug
Metrics-generator enable_client_server_prefix doesn't actually enable the client server prefix without also setting metrics_generator_processor_service_graphs_enable_client_server_prefix to true.
[Our configuration with](https://github.com/utilitywarehouse/opentelemetry-manifests/blob/main/tempo/base/config.yaml#L29) enable_client_server_prefix and without tenant override doesn't add prefixes to metric labels.
Request to /status/config confirms, that the enable_client_server_prefix is set to true. However, it also reports that metrics_generator_processor_service_graphs_enable_client_server_prefix is set to false, even though it is not defined in our configuration.
Setting in our config metrics_generator_processor_service_graphs_enable_client_server_prefix to true resolves the issue- prefixes are correctly added.
We are using Tempo version 2.2.0, which should have this feature enabled. To Reproduce
- Set
enable_client_server_prefixin metric generator config to true, while not setttingmetrics_generator_processor_service_graphs_enable_client_server_prefix. - Query the Prometheus instance for serviceGraph metrics
- See, if metric labels are duplicated for client and server. Sadly, they are not.
- Add enabled
metrics_generator_processor_service_graphs_enable_client_server_prefixto your configuration - Query the Prometheus instance again- now the metric labels are duplicated correctly.
Expected behavior Overrides should not work until they are defined
Environment:
- Infrastructure: Kubernetes (manifests)
- Tempo version 2.2.0
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
I am experiencing this with 2.6.0.
The following config does not properly prefix labels:
metrics_generator:
processor:
service_graphs:
dimensions:
- foo
enable_client_server_prefix: true
Adding the following (as suggested in https://github.com/utilitywarehouse/opentelemetry-manifests/pull/15) works around it:
overrides:
metrics_generator_processor_service_graphs_enable_client_server_prefix: true
I think this code is causing the issue:
https://github.com/grafana/tempo/blob/e33d407c5de4132468307db880d84d7d542710bd/modules/generator/config.go#L130
When we initialize the service graph processor, we copy the setting from the overrides without checking if a value is already set or not.
But... because this field is a bool, it's actually impossible to tell if it has been set or not. It can either be false or true, in which false could mean "I've set this to false" or "I have not set this, it defaulted to false".
So unfortunately this configuration field has actually become useless after we added the override for it.
The value will always be replaced by the value in the overrides. Meaning if you don't set overrides for your tenant it will be reset to false.
It's a known shortcoming that our overrides don't allow leaving a field unset. Empty fields will always get the default value for their type, which for boolean values is false. For string and numbers this is less of an issue.
In the user-configurable overrides we have addressed this by using *bool instead. So if we ever refactor the runtime overrides entirely we could maybe fix this too.
For clarity I'd suggest removing the configuration option from regular config block and only allow it to be set through overrides.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.