Indeterministic errors when deserializing adjacently tagged enums storing integers based on string values
We have come across an error occurring sometimes, when we want to deserialise a configuration containing an adjecently tagged enum. The erorr occurs if the source is a string, i.e. for example coming from an environment variable.
This is a small example to reproduce:
use config::Config;
use serde::Deserialize;
#[derive(Deserialize, Debug)]
#[serde(tag = "type", content = "value")]
enum A {
V1(u64),
V2(String),
}
fn main() {
let config = Config::builder()
.add_source(config::Environment::with_prefix("APP").separator("_"))
.build()
.unwrap();
let a = config.try_deserialize::<A>().unwrap();
}
Then I run this with the following environment variables:
APP_type=V1 APP_value=42 cargo run
This errors 50% of the time and runs through successfully the other 50%. The error message is:
called `Result::unwrap()` on an `Err` value: invalid type: string "42", expected u64
I debugged this and the error occurs if the value key gets stored in the internal hashmap before the type key, therefore the indeterminim stems from the indeterministic nature of the ordering in hashmaps. This might be related to https://github.com/rust-cli/config-rs/issues/442 but I don't think a solution is to just make it deterministic. I would guess that either it is supported to parse strings into variants that store integers, or it is not, it should not depend on the ordering of the tag versus the content.
If one wants to reproduce this without environment variables one can use:
let config = Config::builder()
.set_default("type", "V1")
.unwrap()
.set_default("value", "42")
.unwrap()
.build()
.unwrap();
For a full standalone reproduction case
#!/usr/bin/env nargo
---
[dependencies]
config = { version = "*", features = [] }
serde = { version = "1", features = ["derive"] }
---
#[derive(serde::Deserialize, Debug)]
#[serde(tag = "type", content = "value")]
enum A {
V1(u64),
V2(String),
}
fn main() {
let config = config::Config::builder()
.set_default("type", "V1")
.unwrap()
.set_default("value", "42")
.unwrap()
.build()
.unwrap();
let a = config.try_deserialize::<A>().unwrap();
}
Whats baffling about this is if I play with the order other ways, this doesn't reproduce
#!/usr/bin/env nargo
---
[dependencies]
config = { path = "../config-rs", features = [] }
serde = { version = "1", features = ["derive"] }
---
#[derive(serde::Deserialize, Debug)]
#[serde(tag = "type", content = "value")]
enum A {
V1(u64),
V2(String),
}
fn main() {
let config = config::Config::builder()
.add_source(config::File::from_str(
r#"
{
"type": "V1",
"value": 42
}
"#,
config::FileFormat::Json,
))
.build()
.unwrap();
dbg!(&config);
let a = config.try_deserialize::<A>().unwrap();
let config = config::Config::builder()
.add_source(config::File::from_str(
r#"
{
"value": 42,
"type": "V1"
}
"#,
config::FileFormat::Json,
))
.build()
.unwrap();
dbg!(&config);
let a = config.try_deserialize::<A>().unwrap();
let config = config::Config::builder()
.set_default("type", "V1")
.unwrap()
.set_default("value", "42")
.unwrap()
.build()
.unwrap();
dbg!(&config);
let a = config.try_deserialize::<A>().unwrap();
}
[config-720.rs:28:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"value": Value {
origin: None,
kind: I64(
42,
),
},
"type": Value {
origin: None,
kind: String(
"V1",
),
},
},
),
},
}
[config-720.rs:43:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"value": Value {
origin: None,
kind: I64(
42,
),
},
"type": Value {
origin: None,
kind: String(
"V1",
),
},
},
),
},
}
[config-720.rs:53:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"type": Value {
origin: None,
kind: String(
"V1",
),
},
"value": Value {
origin: None,
kind: String(
"42",
),
},
},
),
},
}
vs
[config-720.rs:28:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"type": Value {
origin: None,
kind: String(
"V1",
),
},
"value": Value {
origin: None,
kind: I64(
42,
),
},
},
),
},
}
[config-720.rs:43:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"type": Value {
origin: None,
kind: String(
"V1",
),
},
"value": Value {
origin: None,
kind: I64(
42,
),
},
},
),
},
}
[config-720.rs:53:5] &config = Config {
defaults: {},
overrides: {},
sources: [],
cache: Value {
origin: None,
kind: Table(
{
"value": Value {
origin: None,
kind: String(
"42",
),
},
"type": Value {
origin: None,
kind: String(
"V1",
),
},
},
),
},
}
thread 'main' (92816) panicked at config-720.rs:54:43:
Whats baffling about this is if I play with the order other ways, this doesn't reproduce
...
I think the difference is that if you parse json it automatically tries to parse as number, so if you look at the kind value it is I64 in this case. The error only happens if it gets parsed as String first. This happens for environment variables if you don't use try_parse(true) and if you directly set a string value.
As far as my debugging goes the problem is that the derived Deserialise for the A enum has two different paths depending on if the tag comes first or the content comes first. It seems like in the path that the content comes first it first deserialises the content to some general type and then later deserialises further to the type that is expected in the variant. But because it deserialises to the general type first it is stored as a String and once the variant is known and it gets deserialised again it calls the deserialise of this general type and not on the config::Value anymore. Since the config::Value has an implementation that converts strings into numbers if needed it works there but not if we first converted the string into this general type.
I didn't realiaze serde_derive put in that fast-path for having the tag first. That makes sense.
The problem then becomes how do we deal with this. As a user, you could enable Environment::try_parsing so a more specific data type gets stored. I was considering changing that in a future version but I'm not finding where I was doing so and I worry that that plan would break this situation.
I didn't realiaze
serde_deriveput in that fast-path for having the tag first. That makes sense.The problem then becomes how do we deal with this. As a user, you could enable
Environment::try_parsingso a more specific data type gets stored. I was considering changing that in a future version but I'm not finding where I was doing so and I worry that that plan would break this situation.
Setting try_parsing(true) is only half a solution as it gives the same problems if you want to parse a String and the content is a number. So with my example above if I do:
use config::Config;
use serde::Deserialize;
#[derive(Deserialize, Debug)]
#[serde(tag = "type", content = "value")]
enum A {
V1(u64),
V2(String),
}
fn main() {
let config = Config::builder()
.add_source(config::Environment::with_prefix("APP").separator("_").try_parsing(true))
.build()
.unwrap();
let a = config.try_deserialize::<A>().unwrap();
}
Now it does not error anymore if I run
APP_type=V1 APP_value=42 cargo run
but it does error (with 50% probability) again if I run
APP_type=V2 APP_value=42 cargo run
The error is now the other way around:
called `Result::unwrap()` on an `Err` value: invalid type: integer `42`, expected a string
Btw. the last example I gave that expects a string but gets an integer is also a problem if you parse from json or any other typed format:
use config::Config;
use serde::Deserialize;
#[derive(Deserialize, Debug)]
#[serde(tag = "type", content = "value")]
enum A {
V1(u64),
V2(String),
}
fn main() {
let config = Config::builder()
.add_source(config::File::from_str(
r#"
{
"value": 42,
"type": "V2"
}
"#,
config::FileFormat::Json,
))
.build()
.unwrap();
let a = config.try_deserialize::<A>().unwrap();
}
If I run this it errors again with 50% probability. Note that the order of value and type doesn't make a difference here as the really relevant order is the internal hash map which seems to not just depend on the original order but on something else.
Ok the integer to string problem can be circumvented by adding quotation marks (") in json (for example "value": "42") and for environment variables too (with proper escaping: APP_type=V2 APP_value="\"42\"" cargo run but the indeterminism if we don't add the quotation marks is still not nice.
Maybe a dumb question, is untagged enum supported for integers? The below snippet always fails when port is u16 but works if it's String.
AAA__A__PORT=1000
use config::Config as Cfg;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
#[serde(untagged)]
enum A {
VariantA { port: u16 },
}
#[derive(Debug, Deserialize)]
struct Settings {
a: A,
}
fn main() {
let config = Cfg::builder()
.add_source(
config::Environment::with_prefix("AAA")
.prefix_separator("__")
.separator("__"),
)
.build()
.unwrap();
let config = config.try_deserialize::<Settings>().unwrap();
println!("{:?}", config);
}