json icon indicating copy to clipboard operation
json copied to clipboard

Deserialization of 128 bit integers fail when used with untagged variants

Open mikonieminen opened this issue 4 years ago • 10 comments

If I have an enum that is deseriliazed as untagged variant and one of the variants hold 128bit integer, then the deserialization fails. This is quite unexpected while everything works as expected with tagged variant.

I've created a simple test case that shows the problem: https://github.com/mikonieminen/json/tree/128bit-bug

I'm not sure if this is directly related to serde_json, but since my use case is JSON specific, I decided to report here. Also, I think json_simd author has run into this same issue: https://github.com/serde-rs/serde/issues/1717

mikonieminen avatar Jan 02 '21 16:01 mikonieminen

I ran into the same issue, however this seems to be failing to deserialize into the correct variant for any integer / float types. The only workaround I found was to deserialize it into the Number type.

#[cfg(test)]
mod tests {
    use serde::Deserialize;
    use serde_json::Number;

    #[derive(Debug, Deserialize)]
    #[serde(untagged)]
    enum Data {
        A { x: i32 },
        B { x: i64 },
        C { x: f32 },
        D { x: f64 },
    }

    #[derive(Debug, Deserialize)]
    #[serde(untagged)]
    enum DataWorks {
        A { x: i32 },
        B { x: i64 },
        C { x: f32 },
        D { x: f64 },
        E { x: Number },
    }
    #[test]
    fn test_deserializer() {
        let value = serde_json::from_str::<DataWorks>(r#" {"x": 1} "#).unwrap();
        // prints value: E { x: Number(1) }
        println!("value: {:?}", value);
        // This fails with thread panicked at 'called `Result::unwrap()` on an `Err` value: Error("data did not match any variant of untagged enum Data", line: 0, column: 0)
        let value = serde_json::from_str::<Data>(r#" {"x": 1} "#).unwrap();
    }
}

atrniv avatar Feb 16 '21 05:02 atrniv

I think you can deserialize 128bit integers using arbitrary_precision, but there are also some issues in the serde side when trying to deserialize 128bit integers. Of course, using arbitrary_precision is less than optimal when it comes to performance and when looking at the code, I would expect true 128bit support to be in place when compiled with 128bit capable Rust version. There are different macros and configs, but fixing serde_json to support it properly was a bit too complicated for me to fix at this point. I had a look at it when I was working on my serde side PR for this: https://github.com/serde-rs/serde/pull/1979

@dtolnay maybe you can comment this topic? And also, if someone could have a look at the PR I mentioned above, it would be great.

mikonieminen avatar Feb 16 '21 08:02 mikonieminen

An even shorter test that fails?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=760962559bf62301fe2b53772cca0d91

use serde::Serialize;
use serde::Deserialize;

#[derive(Deserialize,Serialize)]
#[serde(untagged)]
enum SubscriptionResult {
    Whatever {
        somethingData: u64,  // doesn't work if changed to u128
    },
}

fn main() {
    let t = SubscriptionResult::Whatever {somethingData: 334};
    println!("{}", serde_json::to_string_pretty(&t).unwrap());  // Works for both u64 and u128

    // Does not work for u128, but works fine for u64
    let b : &str = r###"{"somethingData":378842947}"###;
    println!("{}", serde_json::from_str::<SubscriptionResult>(b).is_ok());
    println!("{}", serde_json::from_str::<SubscriptionResult>(serde_json::to_string_pretty(&t).unwrap().as_str()).is_ok());
}

yuzisee avatar Apr 15 '21 12:04 yuzisee

I had to debug this issue today. A struct had a u128 field that worked fine with normal ser/de derive macros but when used in an untagged enum via a NewType it stopped working and broke with an cryptic error message. It would help to have the 128 bit limitation documented somewhere.

akinnane avatar Jul 20 '21 16:07 akinnane

Ran into this too, ouch. To see why this isn't fixed yet, and what the next steps would be to fix it see: https://github.com/serde-rs/serde/pull/1979#pullrequestreview-735514847 and https://github.com/serde-rs/serde/issues/1183 .

It sounds like maintainers have an idea of what direction they'd like to go, but unfortunately, it's non-trivial, and some of it may only be acceptably well implemented after "associated type defaults" reach stabilization (https://github.com/rust-lang/rust/issues/29661), see https://github.com/serde-rs/serde/issues/1183#issuecomment-452033525 .

Now, although this might not be possible to neatly solve yet, it would be neat if there was a simple screaming error message "you're using 128-bit integers with untagged, this won't work" does anyone know if that would be feasible?

If not, the next best thing would be to at least note in the docs of serde_json or serde/untagged that you should be wary of combining the two with u128 / i128.

How does that sound?

Finally, no one posted a workaround yet 😬 , I'll try to find one and report back.

alextes avatar Sep 02 '22 07:09 alextes

Dove pretty deep trying to learn about enums visitors etc. Ran out of time. Went for the much uglier but simpler "just have serde fail once and handle the failure in userland with a second attempt with a second type". Then struggled with the ugliness enough and ended up doing a harsher migration where data in production turned a couple of minutes stale.

Anyhow, skip the deserialize_enum stuff, that's for tagged only. I think you'll have to implement your own deserializer, deserialize_any, then write something custom to decide which variant you're looking at, and then you could hand-off to a derived implementation again with some #[serde(deserialize_with="from_u128_string")].

Makes me want to write a simpler json lib which is probably a terrible idea 😂 .

alextes avatar Sep 02 '22 16:09 alextes

I know this issue is a bit stale, but anyone have an example of a workaround handy? I've spent some time fiddling with a custom deserializer and I don't feel like I'm getting close. Thanks!

chanks avatar Jun 05 '23 14:06 chanks

Sure @chanks and others landing here. How's this?

playground

use serde::{Deserialize, Deserializer};


pub fn from_u128_str<'de, D>(deserializer: D) -> Result<u128, D::Error>
where
    D: Deserializer<'de>,
{
    let s: String = Deserialize::deserialize(deserializer)?;
    Ok(u128::from_str_radix(&s, 10).unwrap())
}

#[derive(Debug, Deserialize)]
pub struct Block {
    pub number: i32,
    #[serde(deserialize_with = "from_u128_str")]
    pub total_difficulty: u128,
}

pub fn main() {
    let block_json = serde_json::json!({
        "number": 0,
        "total_difficulty": "922337204000000000000000000"
    });

    let block = serde_json::from_value::<Block>(block_json).unwrap();

    dbg!(block);
}

alextes avatar Jun 09 '23 10:06 alextes

While @alextes' solution is workaround when you can control the data, it is not a solution to the problem I've raised.

If you can pass big numbers in strings, then the deserialization works and what @alextes suggested is a good solution. I highly recommend passing big numbers in strings, because this deserialization problem is also with other languages. At least used to be there with Python. While it didn't fail, it simply truncated the number. That bug was pretty painful to find.

Now, if you have no control over the received JSON, then it's much more painful, because I haven't found actual solution, except skip the field. Maybe custom visitor could do that, but I haven't tried it.

Here is a playground for my test case that I pointed out in the original message: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c3f52c6df77407b72d2702f75c01a8da

mikonieminen avatar Jun 12 '23 06:06 mikonieminen

It seems you can go around this problem by introducing a custom visitor. Here is a playground example. It's a toy example and in real world the visitor is probably quite nasty to write, but at least this should give an idea how to do it.

@chanks I hope this helps.

mikonieminen avatar Jun 12 '23 07:06 mikonieminen