serde
serde copied to clipboard
Optimize internally tagged enums -- do not use internal buffer if tag is the first field
Fixes #1495
Also that change has one positive side-effect: if tag is the first field, negative effects from #1183 is eliminated, because buffering is not used
@RReverser, feel free to try to run your benchmarks against this branch
I cannot agree with such a question. For me, the runtime performance is much more important thing that compile-time performance. Things are written not for the pleasure of developers, but for solving customer problems.
Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?
I will try to study how to do performance measurements, but any guidance is welcome
Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.
That sounds odd tbh. Compile times are affecting only developers, while runtime affects every user of the library / application, which is way more impactful. Why choose compile-time over runtime perf here when we don't do that at any other levels of development (e.g. opt-level = 0
vs opt-level = 2
etc.)?
I've made some research and there is the results. I've created a library project with 1000 types and I've measured compilation time.
I've noticed small increasing of the compilation time, about 0.01 sec per type (or 7-30%). I think it is acceptable worth for the bigger runtime performance.
Test code and raw data
Created library cargo project with following lib.rs
content:
use serde::{Deserialize};
macro_rules! generate {
($(#[$counter:meta])*) => {
$(
const _: () = {
#[$counter]
#[derive(Deserialize)]
#[serde(tag = "tag")]
enum Node {
Unit,
Struct {
name: String,
list: Vec<Node>,
},
// Uncomment for "big enum" tests
/*
Newtype1(std::collections::HashMap<String, String>),
Newtype2(String),
Newtype3(u32),
Newtype4(f32),
Unit1,
Unit2,
Unit3,
Unit4,
Struct1 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct2 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct3 { f1: String, f2: u32, f3: bool, f4: f64 },
Struct4 { f1: String, f2: u32, f3: bool, f4: f64 },// */
}
};
)*
};
}
// Expanded manually for "expand" tests
generate!(
/// ...
/// 1000 lines
/// ...
);
Tests run with command
cargo +nightly build -Ztimings
Test PC
OS version: Windows_NT x64 10.0.18363 CPUs: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 1992)
Summary table
Small enum (2 variants), 1000 types
Derived both Serialize
and Deserialize
, types generated with generate!
macro
Deserialize + Serialize | master (c26101532509477132721a8c56b7024891aaf1b4) (100%) | PR (a169beee0d84fa0cad022357c974d05c223e32a0) | Diff |
---|---|---|---|
all | 47.37 = (46.96 + 47.10 + 48.06)/3 | 58.67 = (59.27 + 59.17 + 57.56)/3 | +11.29 (+29%) |
codegen | 3.28 = (46.96-43.95 + 47.10-43.92 + 48.06-44.42)/3 | 4.81 = (59.27-53.93 + 59.17-53.85 + 57.56-53.80)/3 | +1.53 (+10%) |
Derived only Deserialize
, types generated with generate!
macro
Deserialize | master (c26101532509477132721a8c56b7024891aaf1b4) (100%) | PR (a169beee0d84fa0cad022357c974d05c223e32a0) | Diff |
---|---|---|---|
all | 39.25 = (39.16 + 39.66 + 38.94)/3 | 50.08 = (49.70 + 51.62 + 48.93)/3 | +10.83 (+28%) |
codegen | 3.62 = (39.16-35.31 + 39.66-35.28 + 38.94-36.31)/3 | 4.53 = (49.70-46.17 + 51.62-45.29 + 48.93-45.21)/3 | +0.91 (+25%) |
Derived only Deserialize
, types written manually
Deserialize + expanded | master (c26101532509477132721a8c56b7024891aaf1b4) (100%) | PR (a169beee0d84fa0cad022357c974d05c223e32a0) | Diff |
---|---|---|---|
all | 38.76 = (38.38 + 39.67 + 38.23)/3 | 50.36 = (50.34 + 50.59 + 50.14)/3 | +11.60 (+30%) |
codegen | 3.24 = (38.38-35.41 + 39.67-35.65 + 38.23-35.50)/3 | 5.67 = (50.34-44.34 + 50.59-44.74 + 50.14-45.02)/3 | +2.42 (+75%) |
Big enum (14 variants), 1000 types
Derived only Deserialize
, types generated with generate!
macro
Deserialize | master (c26101532509477132721a8c56b7024891aaf1b4) (100%) | PR (a169beee0d84fa0cad022357c974d05c223e32a0) | Diff |
---|---|---|---|
all | 241.04 = (236.94 + 239.35 + 246.84)/3 | 257.01 = (257.75 + 258.76 + 254.70)/3 | +16.03 (+7%) |
codegen | 40.90 = (236.94-197.11 + 239.35-199.61 + 246.84-203.70)/3 | 46.63 = (257.75-211.25 + 258.76-210.17 + 254.70-209.91)/3 | +5.72 (+14%) |
Derived only Deserialize
, types written manually
Deserialize + expanded | master (c26101532509477132721a8c56b7024891aaf1b4) (100%) | PR (a169beee0d84fa0cad022357c974d05c223e32a0) | Diff |
---|---|---|---|
all | 238.41 = (238.49 + 236.58 + 240.16)/3 | 254.86 = (258.93 + 254.75 + 250.90)/3 | +16.45 (+7%) |
codegen | 39.70 = (238.49-198.46 + 236.58-199.66 + 240.16-198.02)/3 | 45.05 = (258.93-209.91 + 254.75-211.19 + 250.90-208.34)/3 | +5.35 (+13%) |
@Mingun What about runtime performance improvements?
I didn't measure it, maybe I should
Would this also fix the incorrect error messages caused by the internal buffer? https://github.com/serde-rs/serde/issues/1621
For the optimized case yes, it should