json
json copied to clipboard
Error deserializing integer from scientific notation
Hi,
I have a data field that's an unsigned integer but sometimes gets formatted using scientific notation by my data provider. serde-json fails to deserialize the data-field when it's in scientific notation as it sees the field as a float:
use serde::Deserialize;
use serde_json;
#[derive(Deserialize, Debug)]
struct Foo {
z: usize
}
fn main() {
let json = r#"{"z":1e+06}"#;
serde_json::from_str::<Foo>(json).unwrap();
}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: floating point `1000000`, expected usize", line: 1, column: 10)', src/main.rs:11:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Playground link here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8cc640da1d7465f24b715cefed2bdc71
A little bit smaller, but same error:
assert_eq!(Some(1_000_000), serde_json::from_str::<usize>("1e+06").ok());
The problem seems to be this line: https://github.com/serde-rs/json/blob/6c6823a6b3f97e4482b07eca02a991f30b19dfee/src/de.rs#L429
It should not try to parse it as float directly. Even something like "1.1e+3" should be parseable as usize. I'm not sure if one should try to parse the number, determine if parsing it into usize is possible and then return a result on top of that or what else to do.
@hellow554 JSON doesn't have the concept of integers (only numbers, and everything is treated as a 64-bit, double-precision, IEEE-754 floating point number), however, integer-parsing is provided as an extension for most JSON parsers. However, nearly every integer parser does not allow exponent notation (including Rust, Python, and basically every other language I've ever tried, from COBOL to FORTRAN77). For a curated list of every language I've done, you can check here.
Therefore, I don't feel this is an issue with serde-json, but rather an esoteric use-case you have. What I would recommend you should do is parse it as a float, round-down using f64::trunc(), and treat that as a usize. If you're worried about rounding issues (which could potentially lead to indexing issues, if you have a usize), I'd recommend only parsing the float if it has a < 16 digits (so it can fit into the 53 significant bits of a 64-bit float), and an exponent <= 22 (the maximum that can be exactly represented as well).
For reference, here's the JSON specification. From the railroad diagram, and the other specification material, it is clear, like in Javascript, that there is no concept of integers in JSON, and any ability to parse native integers is an extension.
The maximum safe integer to parse in JSON is therefore Number.MAX_SAFE_INTEGER, or (1<<53) - 1, or 9007199254740991. If you contemplate needing any value higher than that, and have control over how the JSON formats are produced, I highly recommend storing the values as either:
- A string (for example, "900719925474099576").
- An array of 2 32-bit numbers (for example,
[24, 28147497671065611]for the example above).
@Alexhuszagh, thanks for the helpful context! The value is being sent by a vendor, so I don't have direct control over the formatting of the value, but I will try out your truncation suggestion. Thank again!
Therefore, I don't feel this is an issue with serde-json, but rather an esoteric use-case you have.
Just commenting because I encountered the issue recently and found this out.
-
I think the error message would be better if it kept the original string it tried to deserialize, because with big payloads the error is hard to track down when the message looks like a very valid integer failed to be considered as integer
invalid type: floating point ``1671193560``, expected i64 at {locus}. -
Sadly it might not be as esoteric a use-case as we might want, for example AWS API always return timestamps as scientific notation integers (even if the documentation examples don't show it):
{"Messages":[],"MetricDataResults":[{"Id":"expr_0","Label":"expr_0","StatusCode":"Complete","Timestamps":[1.67119356E9,1.6711935E9],"Values":[8.11615297282456,8.48669495866197]}]}
I know I should be reporting an issue with the time crate or directly Amazon for this particular instance, I just wanted to show an example of a not so contrived use case where I can't really do anything and receive integers in scientific notation.
Thanks for having the whole explanation here though!!
I also ran into this issue. Personally - I find it strange that Serde would serialize into a format it cannot parse. I expect the invariant from my parsers:
decode::<X>(encode::<X>(x)) == Ok(x)
I do not agree this is an esoteric use case - but I digress.