rust-csv
rust-csv copied to clipboard
Deserializing bool fields fails if the field is not exactly "true" or "false".
What version of the csv crate are you using?
1.0.2
Briefly describe the question, bug or feature request.
When deserializing a TSV file, if a boolean column's values are not exactly true or false, the deserialization fails. Some editors, like Google Sheets use "TRUE" and "FALSE", or "True" and "False" instead, making files edited with them throw an error when deserializing.
Include a complete program demonstrating a problem.
Here is a working example of the problem. The works.tsv file works, but the fails.tsv throws an error in each row.
csv-bug.tar.gz
What is the expected or desired behavior of the code above?
To parse correctly TRUE, True, FALSE and False as booleans.
Can you provide a specification for which values should be interpreted as booleans? What about TrUe? Or falsE? Or 1? Or 2?
I might be able to get on board with a case insensitive comparison, but the actual scope of what is a boolean is interesting. It kind of seems like 1 and 0 should also be interpretable as booleans.
Note that you can work around this pretty easily with a custom deserialize_with function.
I haven't been programming for too long, so this list may be a bit incomplete, but I've seen the following values interpreted as booleans:
true/false, with any combination of lowercase/uppercase, liketrUeorfAlSe.0/1, being0 = falseand1 = true.yes/no, with any combination of lowercase/uppercase.t/f, beingt = trueandf = false.y/n, beingy = yes/trueandn = no/false.
Those are more or less all the interpretations I've seen in the wild but the first two were by far the most common. I've barely seen the other ones being used. So an specification of What can be considered an interpretable boolean would be:
true/false: or any combination of lowercase/uppercase, liketrUeorfAlSe.0/1: being0 = falseand1 = true.
I have this very same issue but in the other direction: I need to write CSV files that I can load into a MySQL database and boolean fields need to be represented as 0/1 for this to work. But false/true are hard-coded in the writer. Is there a convenient way to customize this?
Yes. You can use serde's attributes to specify a different serialization function. Or use a custom type.
Neither adding attributes to every boolean field nor using a custom type for something as basic as a boolean sounds too appealing. Customizing the writer with the desired string representation would be ideal for me. But I'll have another go at the Serde docs. Thanks a lot!
Was kind of hoping this was built in... I think this probably solves the original question, though?
use serde::{Deserialize, Deserializer, Serialize};
use serde::de::{self, Unexpected};
/// Converts a string to a boolean based on truthy and falsy values
fn bool_from_str<'de, D>(deserializer: D) -> Result<bool, D::Error>
where
D: Deserializer<'de>,
{
match String::deserialize(deserializer)?.to_lowercase().as_str() {
"t" | "true" | "1" | "on" | "y" | "yes" => Ok(true),
"f" | "false" | "0" | "off" | "n" | "no" => Ok(false),
other => Err(de::Error::invalid_value(
Unexpected::Str(other),
&"Must be truthy (t, true, 1, on, y, yes) or falsey (f, false, 0, off, n, no)",
)),
}
}
#[derive(Clone, Debug, Deserialize, Serialize)]
struct Foo {
/// Some field
#[serde(deserialize_with = "bool_from_str")]
bar: bool
}
Original version: Playground
@brianbruggeman As a minor refinement, I'd consider changing String::deserialize() to Cow::<str>::deserialize() to avoid allocations when possible (which should be in most cases). Playground.
Edit: Cow<str> doesn't help when used this way because its deserialize() always allocates. Serde does have code that special-cases Cow, but it is part of the proc-macro, so Cow must be used as a field type.
@brianbruggeman As a minor refinement, I'd consider changing
String::deserialize()toCow::<str>::deserialize()to avoid allocations when possible (which should be in most cases). Playground.
@hniksic Unfortunately, that doesn't change anything. A Cow is always deserialized into its Owned variant (https://docs.rs/serde/latest/src/serde/de/impls.rs.html#1823-1835), which for Cow<'_, str> means a String. What you likely want is a transient str, but for this you must use a Visitor. The visit_str function can then return the bool.
@jonasbb Thanks for the correction. Interestingly, it appears to be possible to force borrowing using #[serde(borrow)], but that doesn't look like it generalizes to this case.
It may not really matter. The to_lowercase is problematic as it will allocate a String. I suppose it is possible to move around that with more optional truthy/falsey values, but I suspect that may make maintenance more unwieldy (true, True, tRue, tRUE, ...).
To redeem myself, here is the visitor-based version. It calls to_lowercase() only in the error path which is (IMHO) a reasonable compromise between efficiency and maintainability.
use serde::de::{self, Unexpected, Visitor};
use serde::{Deserialize, Deserializer, Serialize};
/// Converts a string to a boolean based on truthy and falsy values.
///
/// Designed to be used as #[serde(deserialize_with = "bool_from_str")]
fn bool_from_str<'de, D>(deserializer: D) -> Result<bool, D::Error>
where
D: Deserializer<'de>,
{
struct BoolVisitor;
impl Visitor<'_> for BoolVisitor {
type Value = bool;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(
formatter,
"truthy (t, true, 1, on, y, yes) or falsey (f, false, 0, off, n, no) string"
)
}
fn visit_str<E: de::Error>(self, s: &str) -> Result<bool, E> {
match s {
"t" | "T" | "true" | "True" | "1" | "on" | "On" | "y" | "Y" | "yes" | "Yes" => {
Ok(true)
}
"f" | "F" | "false" | "False" | "0" | "off" | "Off" | "n" | "N" | "no" | "No" => {
Ok(false)
}
other => {
// handle weird mixed-case spellings like tRue or nO
match other.to_lowercase().as_str() {
"true" | "on" | "yes" => Ok(true),
"false" | "off" | "no" => Ok(false),
other => Err(de::Error::invalid_value(Unexpected::Str(other), &self)),
}
}
}
}
}
deserializer.deserialize_str(BoolVisitor)
}
If anyone is curious, I've also recently added a tiny crate, serde-this-or-that which hopefully should handle converting numbers and strings (including the edge cases above) to a bool type. It is available on crates.io as well.