serde icon indicating copy to clipboard operation
serde copied to clipboard

Add finalizer attribute hook to validate a deserialized structure

Open erickt opened this issue 7 years ago • 19 comments

In https://github.com/servo/rust-url/pull/252, @Manishearth is writing a serde deserializer for urls since parsing a string can be expensive. However, to do this safely from untrusted sources, they need to validate that the data is semantically correct. We unfortunately don't have an easy way of doing this without writing a custom deserializer. It'd be pretty simple to do as an attribute, however. Here's my idea:

#[derive(Serialize, Deserialize)]
#[serde(finalizer="check_greater_than_10")]
struct GreaterThan10 {
    n: usize,
}

fn greater_than_10<D>(item: GreaterThan10) -> Result<GreaterThan10, D::Error>
    where D: Deserializer
{
    if item.n > 10 {
        Ok(item)
    } else {
        Err(Error::invalid_value("less than or equal to 10"))
    }
}

erickt avatar Dec 17 '16 22:12 erickt

This seems relevant: https://github.com/Keats/validator. Let's make sure to support this library in a nice way.

dtolnay avatar Dec 28 '16 03:12 dtolnay

My hacky workaround is to

  1. don't derive Deserialize for my main data type "DataType"
  2. copy my main data type to a "FakeDataType" which has the same fields, and make this one derive Deserialize
  3. implement Deserialize for DataType by:
    1. invoking deserialize to FakeDataType
    2. validate the result
    3. convert FakeDataType to DataType.

It's a dumb hack but at least it allows me to avoid rewriting all the deser code manually.

radix avatar Jan 04 '17 22:01 radix

#626 has another use case.

I am on board with implementing exactly the approach in @erickt's comment.

dtolnay avatar Jan 07 '17 21:01 dtolnay

Another use-case from irc by @alteous

I would like to obtain a maximum integer value from a tree of structures. The context is these values are offsets into arrays, and I want to check that they are all valid ahead of time. The integer values are wrapped in Index<T> types.

oli-obk avatar Mar 29 '17 12:03 oli-obk

I believe I could also use this for an enum which is either "unknown" or some value fitting a specific pattern, but both are strings.

beamspease avatar Apr 16 '17 00:04 beamspease

I have the same usecase as in #626. I want to load shapes, and I also want to store some precomputed attributes on them that are derived from the other fields (for example radius_squared = radius * radius).

I don't want to serialize or deserialize these fields, but I want them to be present in the finished structs. The shapes could be deeply nested in another structure that is loaded, so it's very messy to try to go down the structures in a mutable fashion and set it on the structs manually.

Mange avatar Apr 17 '17 20:04 Mange

Perhaps a silly question, but why an attribute instead of a trait? I pictured:

trait SerdeFinalize: Sized {
    fn finalize<E:Error>(self) -> Result<Self, E>;
}

#[derive(Serialize, Deserialize)]
#[serde(Finalize)]
struct GreaterThan10 {
    n: usize,
}

impl SerdeFinalize for GreaterThan10 {
    fn finalize<E:Error>(self) -> Result<Self, E> {
        if self.n > 10 {
            Ok(self)
        } else {
            Err(E::invalid_value(Unexpected::Unsigned(self.n as u64), &"a number above 10"))
        }
    }
}

(And maybe #[serde(Finalize)] would no longer be needed once specialization is stable and everything could get a default no-op finalizer that's overridable if desired.)

scottmcm avatar Apr 19 '17 03:04 scottmcm

I've been working on something related to this for derive_builder. My assumption has been that the two-step process:

  1. Gather data from some outside source into an intermediate struct
  2. Finalize the domain type (perform semantic validations, generate cached values, RAII, etc.)

should be split across two different types, and that the sidecar type could be generated through a combination of macros, i.e. deriving a Builder which in turn derives Deserialize. In that world, field-level validations would be handled through types, rather than the attributes in the validator project.

I've run into a few hurdles so far:

  1. I don't know of a clean way to invoke TryFrom<...> that doesn't require writing manual deserialization code. This makes fallible conversions to domain-specific types (like "Credit Card" or "Email") very tedious to write. I'm hoping that the stabilization of TryFrom will pave the way for a more reusable solution.
  2. Surfacing errors of many different types to the outer caller requires boxing or other approaches that complicate error inspection.
  3. Getting multiple errors during a single deserialization isn't possible.

TedDriggs avatar Apr 20 '17 20:04 TedDriggs

Cross-referencing https://github.com/Keats/validator/issues/22.

It looks like one of the main concerns there is how to bubble up structured/detailed errors when validation/finalization fails. If anybody has usecases/opinions, it would be nice to feed them back there.

lucab avatar Apr 28 '17 08:04 lucab

There seem to be two situations for this:

  1. After deserialization, perform an infallible transformation `FnOnce(T) -> T
  2. After deserialization, perform a fallible transformation FnOnce(T) -> Result<T, E: serde::de::Error>

From a naming perspective, these seem exactly like map and and_then. It probably shouldn't be valid to declare both on one type/field.

@lucab I've been going back and forth on the errors front. Having that function return a serde Error is really useful for composition with this feature; maybe users can handle that conversion in their own code?

TedDriggs avatar May 12 '17 17:05 TedDriggs

Any progress on this lately?

alexreg avatar Aug 17 '17 01:08 alexreg

It would be nice to have a hook like the one mentioned by @scottmcm. With validator, I could do something like

#[derive(Debug, Validate, Deserialize)]
#[serde(Finalize)]
struct SignupData {
    #[validate(email)]
    mail: String,
    #[validate(url)]
    site: String,
    #[validate(length(min = "1"), custom = "validate_unique_username")]
    #[serde(rename = "firstName")]
    first_name: String,
    #[validate(range(min = "18", max = "20"))]
    age: u32,
}

// This 
impl SerdeFinalize for SignupData {
    fn finalize<ValidationErrors>(self, deserialization_errors: Option<HashMap<Cow<String>, serde::de::Error>>) -> Result<Self, ValidationErrors> {
      if let Some(de_errors) = deserialization_errors {
           // format errors and return without doing any validation?
      }
      // the current Validate::validate fn would be impl here instead
    }
}

// And use it later
let data: SignupData = serde_json::from_str(data)?;

This way it's a one step action for users. The part I'm not sure about is if a user wants to hook into the serde error messages to add i18n for example or to have only one kind of error. I believe https://docs.serde.rs/serde/de/trait.Error.html would be enough for that but those errors will need to be passed to whatever Finalize function is so the validation library can format them as seen in the snippet above (type completely made up and probably incorrect).

Keats avatar Aug 18 '17 07:08 Keats

We are brainstorming some ways to generalize this in #1118.

dtolnay avatar Dec 17 '17 16:12 dtolnay

Note that the try_from attribute on containers may often suffice for this ( https://serde.rs/container-attrs.html#from ). This issue's original example can be written as so:

#[derive(Serialize, Deserialize)]
#[serde(try_from="usize")]
struct GreaterThan10(usize);

impl TryFrom<usize> for GreaterThan10 {
    type Error = &'static str;
    fn try_from(value: usize) -> Result<Self, Self::Error> {
        if value > 10 {
            Ok(GreaterThan10(value))
        } else {
            Err("value not greater than 10")
        }
    }
}

bstrie avatar Aug 27 '19 18:08 bstrie

@bstrie's workaround is pretty good. For custom structs, you can do make a shadow type, implement deserialize on it, and do validations within the TryFrom implementation.

use serde::Deserialize;
use serde_json;
// The target is to not allow deserialization if option1 & option2 are none
#[derive(Deserialize, Debug)]
#[serde(try_from = "MyTypeShadow")]
pub struct MyType {
    option1: Option<usize>,
    option2: Option<usize>,
}

// The shadow type only has to implement Deserialize
#[derive(Deserialize)]
struct MyTypeShadow {
    option1: Option<usize>,
    option2: Option<usize>,
}

pub struct MyTypeValidationError;

// The error type has to implement Display
impl std::fmt::Display for MyTypeValidationError {
    fn fmt(&self, formatter: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(formatter, "option1 and option2 cannot be null")
    }
}

impl std::convert::TryFrom<MyTypeShadow> for MyType {
    type Error = MyTypeValidationError;
    fn try_from(shadow: MyTypeShadow) -> Result<Self, Self::Error> {
        let MyTypeShadow { option1, option2 } = shadow;
        if option1.is_none() && option2.is_none() {
            return Err(MyTypeValidationError);
        }
        // Any other validations
        Ok(MyType { option1, option2 })
    }
}

fn main() {
    // This will return an Err
    println!("{:?}", serde_json::from_str::<MyType>(r##"{}"##));
    // This will work
    println!(
        "{:?}",
        serde_json::from_str::<MyType>(r##"{"option1": 20}"##)
    );
}

menixator avatar Aug 29 '20 11:08 menixator

Would a PR implementing the approach in the OP (https://github.com/serde-rs/serde/issues/642#issue-196247375) be accepted (a finalizer= attribute on the struct level), or is this issue blocked on something else?

sffc avatar May 23 '22 15:05 sffc

A finalizer fn for deserialization is also useful for when you were forced to do a #[serde(skip)] on a field due to it being from a 3rd party lib with no serialization support, so you have to reinitialize the data once the struct is deserialized (or some other post-setup code)

MolotovCherry avatar Jun 16 '22 18:06 MolotovCherry

Another use case: Discord API libraries serenity and twilight. Some JSON payloads omit important data from nested objects because it can be inferred from top-level fields. But when deserializing into a Rust type, we want to include that important data in the nested object itself.

Serenity's current workaround is a manual Deserialize impl that deserializes into a serde_json::Value, patches that, and then continues to deserialize into the target type (1). In other places, it manually parses from the patched serde_json::Value into the target type directly (2, 3)

Twilight's current workaround is a full blown 710LOC manual Deserialize impl (1)

kangalio avatar Oct 26 '22 20:10 kangalio

Just skimmed over this ticket, and IIUC, there's no built-in support for this in serde 1.0.x, correct?

The TryFrom pattern in https://github.com/serde-rs/serde/issues/642#issuecomment-683276351 looks like a great option for my use case (which is post-deserialization validation).

Because this pattern seems useful and somewhat common, I just submitted https://github.com/serde-rs/serde-rs.github.io/pull/148 to add it to https://serde.rs .

nathan-at-least avatar May 08 '23 19:05 nathan-at-least