[configuration] Force type when doing environment variable substitution
What are you trying to achieve?
When substituting env variables like "00001" (as a string) (e.g. into an attribute), the type of the produced value is automatically Int. I want to be able to force it to type Str at substitution. Note, currently with attribute: convert it is done after the value is already substituted as an Int, so in this case it will be doing Int(1) => Str(1), which is still not the same as Str(00001).
** Context/Example **
With config
resource/common_labels:
attributes:
- key: some_id
value: "${id}"
action: upsert
When the env var id = "00001", the value of the attribute "some_id" will be Int(1). If we do a
- key: some_id
action: convert
converted_type: string
afterwards, the value is Str(1) which is still different from "00001" / Str(00001).
Seems like this should be transferred to opentelemetry-collector-contrib.
Filed https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/36183 though I feel like this is not a one-processor-specific problem.
hi @flyfire2002, we don't think this is a spec issue since the spec doesn't say anything about collector processors
I think this is indeed a spec issue as the issue is not with collector processors but with how the collector configuration system handles environment variable substitution. The expressed intent there is for the behavior to "support the exact same syntax as the Configuration WG." Here collector processors were only used as the medium for conveying a description of the issue.
If the behavior the Configuration WG expects is for an environment variable containing 0001 to be cast as an integer 1 before any further processing can happen with no mechanism for signaling that it should instead be treated as a string then I think the specification is defective. As an example, AWS account IDs are strings that match the regex ^[0-9]{12}$. That means 012345678912 is a valid AWS account ID that would be impossible to represent correctly in a configuration using environment variable substitution as it will be interpreted as the integer value 12345678912.
@open-telemetry/configuration-approvers can you review @Aneurysm9's https://github.com/open-telemetry/opentelemetry-specification/issues/4281#issuecomment-2603193985? thanks!
with no mechanism for signaling that it should instead be treated as a string
There are two mechanisms for this. First, if you want to ensure that an ambiguous value like 0001 is interpreted as string "0001" instead of an integer 1, you can put it in double quotes:
# input, with id=0001
key: "${id}" # yields string "0001"
You can see an example of this in the spec here.
Second, you can use YAML tags to coerce the parser to interpret the value as a specific type:
key1: 1 # yields integer 1
key2: !!str 1 # yields string "1"
key3: !!str ${id} # yields "0001"
There are two mechanisms for this. First, if you want to ensure that an ambiguous value like
0001is interpreted as string"0001"instead of an integer1, you can put it in double quotes:# input, with id=0001 key: "${id}" # yields string "0001"You can see an example of this in the spec here.
This does not work. Perhaps this is a defect in the collector's handling of substitutions. I'm also not sure the linked example is illustrating this as I can't tell whether "true" is being coerced to the boolean true and then inserted into the exogenous quotes. This would not have the expected results with an integer type as coercing it to an integer would make 0001 become 1, which was the behavior we observed.
To be more concrete, here's an example of a collector configuration chunk that has the issue:
ACCOUNT_ID=012345678912
attributes/add_labels:
actions:
- action: insert
key: account_id
value: ${env:ACCOUNT_ID} # evaluates to `12345678912`
Wrapping the substitution in quotes at this layer of processing has no effect as the input has already been cast as an integer. IOW, this configuration produced the same result, namely a truncated account ID:
attributes/add_labels:
actions:
- action: insert
key: account_id
value: "${env:ACCOUNT_ID}" # evaluates to `12345678912`
Wrapping the value in the environment variable in quotes such that they were part of the value prevented integer conversion and truncation, but then resulted in undesired literal quotes in the attribute value:
ACCOUNT_ID="\"012345678912\""
attributes/add_labels:
actions:
- action: insert
key: account_id
value: ${env:ACCOUNT_ID} # evaluates to `"012345678912"`
Ultimately, in this particular situation we were able to work around the issue by reading the environment variable with the quotes into a temporary attribute, extracting the desired digits from that temporary attribute, and then deleting the temporary attribute. This is obviously less than ideal and simply not an option in some situations.
attributes/add_labels:
actions:
- action: insert
key: tmp_account_id
value: ${env:ACCOUNT_ID}
- action: extract
key: tmp_account_id
pattern: (?<account_id>\d{12})
- action: delete
key: tmp_account_id
Second, you can use YAML tags to coerce the parser to interpret the value as a specific type:
key1: 1 # yields integer 1 key2: !!str 1 # yields string "1" key3: !!str ${id} # yields "0001"
Perhaps this is a collector defect, but I tried this:
ACCOUNT_ID="!!str 012345678912"
and the injected attribute value was !!str 012345678912 and not 012345678912 as expected. This seems to be the desired behavior based on the collector SIG's documentation.
I'm also not sure the linked example is illustrating this as I can't tell whether "true" is being coerced to the boolean true and then inserted into the exogenous quotes.
The string value of the environment variable is substituted in place of the env var reference as is, without coercion. Then, the YAML parser is response for taking the resulting value and resolving the tag type.
For example:
key: "true" # value resolves as type string, tag URI tag:yaml.org,2002:str
key: true # value resolves as type bool, tag URI tag:yaml.org,2002:bool
This behavior should produce the results you're looking for. Assuming ACCOUNT_ID=0001, I would expect the following:
key: "${env:ACCOUNT_ID}"
# expected
key: "0001" # value resolves as type string, tag URI tag:yaml.org,2002:str
Perhaps this is a defect in the collector's handling of substitutions Perhaps this is a collector defect
I can't say whether this is intentional design of collector or a defect, but I can say that this is not the intended or actual behavior of declarative config.
ACCOUNT_ID="!!str 012345678912"
I would have expected you to do ACCOUNT_ID="012345678912", and:
key: !!str ${env:ACCOUNT_ID}
That should result in a value resolved to type string, tag URI tag:yaml.org,2002:str.
This behavior should produce the results you're looking for. Assuming
ACCOUNT_ID=0001, I would expect the following:key: "${env:ACCOUNT_ID}" # expected key: "0001" # value resolves as type string, tag URI tag:yaml.org,2002:str
Yes. That's exactly the behavior I would expect as well. It is not the behavior that was encountered nor is it the documented expected behavior from the collector SIG where they believe they are aligning with this specification.
I can't say whether this is intentional design of collector or a defect, but I can say that this is not the intended or actual behavior of declarative config.
This is worrying, then, as the collector SIG's docs say:
This means we support the exact same syntax as the Configuration WG.
If the collector SIG doesn't correctly interpret this spec, can anyone?
I would have expected you to do
ACCOUNT_ID="012345678912", and:key: !!str ${env:ACCOUNT_ID}
That doesn't work because the ${env:ACCOUNT_ID} value has already been parsed as an integer by the time it is evaluated as part of the larger structure. See also https://github.com/open-telemetry/opentelemetry-collector/issues/8565#issuecomment-1744750712 which indicates the expectation that the !!str modifier would need to be in the environment variable value to ensure the environment variable was evaluated to a string.
@mx-psi can you or someone else from @open-telemetry/collector-maintainers take a look at this and confirm where the issue lies?
If the collector SIG doesn't correctly interpret this spec, can anyone?
Here's a unit test in the java implementation showing how it resolves types when env var substitution is used, demonstrating with and without quotes: https://github.com/open-telemetry/opentelemetry-java/blob/13e2b8c4dfeba09f361b47986febbd792ab1ae60/sdk-extensions/incubator/src/test/java/io/opentelemetry/sdk/extension/incubator/fileconfig/FileConfigurationParseTest.java#L591-L659
That doesn't work because the ${env:ACCOUNT_ID} value has already been parsed as an integer by the time it is evaluated as part of the larger structure.
Oh I see. It depends on how the implementation substitutes the value and when the type is resolved. In the java implementation a reference, given !!str ${env:ACCOUNT_ID}, the env var reference is replaced as is, without any attempt to interpret its type. After all env vars values are all strings and carry no type. So we end up with !!str 012345678912, which a YAML parser should resolve to type string with tag URI tag:yaml.org,2002:str.
I'd argue this is the correct implementation. It seems like the collector is trying to interpret the type of the env var value and injecting a string representation of the interpreted type. If this is indeed the case, I'm not sure what the advantage of this could be.
It seems like the collector is trying to interpret the type of the env var value and injecting a string representation of the interpreted type. If this is indeed the case, I'm not sure what the advantage of this could be.
Oh, it's (better|worse) than that. It's not injecting a string representation of the interpreted type, it is evaluating the contents of the environment variable as a YAML document and injecting the resulting value in the place where the placeholder had been, potentially as a structure in its own right. This is wonderful if you want to have entire chunks of your configuration injected from environment variables. Not so much when you need to ensure a value is treated as a string.
I need more time to catch up with this discussion, but I think the particular case Anthony is referring to is problematic for the Collector since the attributes processor has an any type in this field, which doesn't play nicely when you want to have a particular type. I am honestly not sure what the spec would say in this particular situation.
Also note that the Collector does not do this exactly
it is evaluating the contents of the environment variable as a YAML document and injecting the resulting value in the place where the placeholder had been, potentially as a structure in its own right.
Depending on the value, it may instead be keeping two possible representations of the value: both a string and the result of unmarshaling said string into an any type, and then use the target type to decide which one of the two representations to use. This is not exactly what the spec says but I (incorrectly?) assumed it would lead to the same behavior.
As I said, I'll catch up and give a proper reply here once I have done so.