Support POSIX compliant environment variables for multipart names
It is not currently possible to use kebab-case or SCREAMING-KEBAB-CASE, snake_case, or SCREAMING_SNAKE_CASE for serde name formatting for multipart names, while also using POSIX compliant environment variables to set config values. Setting separator to _ is sufficient for single word names, however multipart names still require the use of hyphen in kebab variants, and map incorrectly for both kebab and snake variants.
Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit.
[citation]
Consider the following example structs, which use kebab case because it makes the semantic issues clear more easily:
#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct TestConfig {
single: String,
simple: SimpleInner,
value_with_multipart_name: String,
inner_config: TestInner,
}
#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct SimpleInner {
val: String,
}
#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct ComplexInner {
another_multipart_name: String,
}
fn get_environment() -> config::Environment {
config::Environment::with_prefix("PREF").separator("_")
}
The simple case works fine here. PREF_SINGLE=val0 will override TestConfig.single as expected.
SimpleInner is a bit more complex, but also works fine. The dot separators between levels are replaced with underscores, so PREF_SIMPLE_VAL=val2 maps correctly to TestConfig.simple.val.
Things break down with multipart names, however. TestConfig.value_with_multipart_name would require an environment variable of PREF_VALUE-WITH-MULTIPART-NAME, which isn't shell compatible. Naively, you can imagine we can fix this by converting POSIX separators, _, into kebab separators, -, however this also does not work. To explain why, lets look at the behavior of a slightly more complex variation...
For a multilevel value, TestConfig.inner_config.another_multipart_name would initially need an environment variable of PREF_INNER-CONFIG_ANOTHER-MULTIPART-NAME. Level separators are converted into . under the hood. So the key we need to match is inner-config.another-multipart-name. If you then apply a separator config, it will convert any instances of that separator into a .. This would yield inner.config.another.multipart.name, which would map to a structure with 5 levels of single name fields! :flushed: This is the same problem we ran into above... PREF_VALUE-WITH-MULTIPART-NAME would become a key of value.with.multipart.name, a 4 level field, rather than the single level we actually want. Interestingly, this issue is also extant for multipart names with snake_case and SCREAMING_SNAKE_CASE, which don't require a code change to trigger the leveling inconsistency.
I initially intended to submit a PR for this, however the current structure of the library doesn't provide enough context at the time of parsing for the distinctions to be made. In order to properly map with a single separator, the structure of the struct/enum must be known at the site of environment parsing. There is a corner case possible when using a single separator, if there exist conflicting names, like so:
#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct TestConfig {
foo_bar: String,
foo: Foo,
}
#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
struct Foo {
bar: String,
}
This would expect both PREF_FOO_BAR for both fields. One workaround for this would be to repeat the separator at level boundaries, resulting in PREF_FOO_BAR for the single level, and PREF_FOO__BAR for the 2 level version. This is not very idiomatic, however, and so I think would be an unfortunate default behavior. This could be used to, somewhat awkwardly make kebab or snake case function, using separator("__"), however this yields environment variables like PREF__FOO_BAR and PREF__FOO__BAR, which are also not idiomatic.
For the time being, using any of lowercase, UPPERCASE, PascalCase, or camelCase will allow setting values correctly, though these are not always idiomatic formats for the config files, if shared with other tooling.
If there is a will to allow enough restructuring to supply the type definition while parsing, I'd be happy to put together an implementation, after some discussion about how to best handle this. Disclaimer: I'm new to rust, so this may take more time and/or review changes than otherwise it might.
Thank you a lot for this very detailed issue description.
You are of course always welcome to open pull requests, even to refactor large parts of the library! There are currently no other refactorings planned, so there shouldn't be any issues with conflicts or such! :laughing:
On the issue at hand: Yes, I agree that we should have better handling for these cases. I'd love to see patches for improving the situation! Please make sure to add (loads of) tests for every corner-case you find!
Just wanted to say I just ran into this issue and it was confusing to say the least (reading environment variables with two levels of nested names with underscores, e.g. PREFIX_OUTER_NAME_INNER_NAME would result in outer_name not found error.
I have pushed a remedy for this in #380. Not a perfect solution, but a simple one that is not a breaking change either, so it may be considered.