serde_with icon indicating copy to clipboard operation
serde_with copied to clipboard

Stack overflow when using `OneOrMany` on `Vec<Self>` in any Rust type (struct / enum)

Open m-kuzmin opened this issue 8 months ago • 6 comments

The most minimal example:

#[test]
fn de_vec_self() {
    use serde_json::json;
    use serde_with::{formats::PreferMany, formats::PreferOne, serde_as, OneOrMany};

    let json = json!(/* literally anything */);

    #[serde_as]
    #[derive(Deserialize)]
    struct VecSelf(#[serde_as(as = "OneOrMany<_, PreferMany>")] Vec<Self>);

    let _: VecSelf = serde_json::from_value(json).unwrap();
}
running 1 test

thread 'de_vec_self' has overflowed its stack
fatal runtime error: stack overflow

I started with an enum which is similar to serde_json::Value, but has more restrictions on what variants are allowed. It turns out you don't even need an enum, a newtype with the as=OneOrMany attr is sufficient to always trigger a stack overflow regardless of the value being parsed.

m-kuzmin avatar Apr 28 '25 15:04 m-kuzmin

serde_with = "3.12.0"

m-kuzmin avatar Apr 28 '25 15:04 m-kuzmin

Thanks for reporting the issue. Unfortunately, I don't think I will be able to help here, except to maybe document this special case and avoid matching OneOrMany with Vec<Self>.

Conceptually, you built an unbounded recursion. This recursion then causes the stack overflow. When you deserialize the VecSelf in JSON this doesn't consume any tokens. Next the OneOrMany comes in and buffers all available input and then tries to deserialize Self from it. But Self is VecSelf and we are back in the beginning without having consumed anything, thus there is an unbounded recursion.

I initially thought this might be partially caused by serde_json (e.g., [[[]],[]], since here newtypes are transparent, but I get the same stackoverflow with ron, where you can even get the struct name of newtypes included VecSelf([VecSelf([VecSelf([])]),VecSelf([])]).

jonasbb avatar Apr 28 '25 20:04 jonasbb

Perhaps I should have included the real example. I got distracted by chasing a more minial version and didn't realize that the example above is actually impossible to deserialize (unless it's []). What I was trying to achieve is basically a serde_json::Value:

enum PropertyValue {
  String(String),
  Int(i32),
  ...
  Array(Vec<Self>)
}

Unfortunately, I do not have any deserialize implementations for the Value-like type because I only care that serialize reduces the vec to a string, not deserialize. Serialize does not have this recursion problem by the way.

The solution should be simple - deserialize a single element first. If that fails, deserialize a sequence. Because I am lazy and do not want to write yet another impl Deserialize, I just made this helper type, which I used for deserializing HashMap<String, Vec<String>>:

// impl Deserialize for Newtype(HashMap)

#[derive(Deserialize)]
#[serde(untagged)]
enum Veclike {
    One(String),
    Many(Vec<String>),
}

// deserialize Veclike
// match on it's variant, create a `vec![string]` if necessary, and insert into the `HashMap`

Offtopic: I could probably use Veclike as the map's value but I don't think it's much different for the user. Either they match on the variants or they just do a .as_slice() { [single] => ....

m-kuzmin avatar Apr 28 '25 22:04 m-kuzmin

The RFC I'm implementing is for jCard (the vicardi crate) and basically, a good implementation is supposed to use JSON strings whenever there is a single element in the array:

-["adr", {"lang": ["en"]} "text", ["..."]]
+["adr", {"lang":  "en" } "text",  "..." ]
                  ^~~~~~          ^~~~~~~

I cannot use the macro for {"lang": "en"} because I cannot annotate a hashmap and I do not want to leak serde newtypes to the user. But I think it works at that position. So instead, I just wrap the entire hashmap in a newtype so I can implement custom rules for deserializing it and use Veclike inside that custom impl.

The issue is in the ["..."]. That element, PropertyValue, can be a lot of different things including arrays of Self. Technically, the RFC says that the elements of that array are the same type, but I have a feeling that it's best to just let the user figure out the type of each element, rather than failing to deserialize some weird jCard extension.

m-kuzmin avatar Apr 28 '25 22:04 m-kuzmin

Could you please post the full types that you are using and that are causing you the issue?

The solution should be simple - deserialize a single element first. If that fails, deserialize a sequence. Because I am lazy and do not want to write yet another impl Deserialize, I just made this helper type, which I used for deserializing HashMap<String, Vec<String>>:

This way, single first then sequence, is already the way it is implemented.

https://github.com/jonasbb/serde_with/blob/1b395677fc9565c3191c9e3668f5ed3104560562/serde_with/src/de/impls.rs#L1553-L1564

jonasbb avatar May 03 '25 21:05 jonasbb

https://github.com/uzinfocom-org/vicardi/blob/main/src%2Flib.rs#L203

This is the type I've tried to use one of many for.

m-kuzmin avatar May 03 '25 23:05 m-kuzmin