serde icon indicating copy to clipboard operation
serde copied to clipboard

Optimize internally tagged enums -- do not use internal buffer if tag is the first field

Open Mingun opened this issue 3 years ago • 7 comments

Fixes #1495

Also that change has one positive side-effect: if tag is the first field, negative effects from #1183 is eliminated, because buffering is not used

@RReverser, feel free to try to run your benchmarks against this branch

Mingun avatar Nov 03 '20 20:11 Mingun

I cannot agree with such a question. For me, the runtime performance is much more important thing that compile-time performance. Things are written not for the pleasure of developers, but for solving customer problems.

Would you be able to provide some measurements showing the impact of this change on the time to compile internally tagged enums?

I will try to study how to do performance measurements, but any guidance is welcome

Mingun avatar Feb 28 '21 10:02 Mingun

Overall I would rather optimize for lowering their compile time, not on performance or features at the expense of compile time.

That sounds odd tbh. Compile times are affecting only developers, while runtime affects every user of the library / application, which is way more impactful. Why choose compile-time over runtime perf here when we don't do that at any other levels of development (e.g. opt-level = 0 vs opt-level = 2 etc.)?

RReverser avatar Feb 28 '21 21:02 RReverser

I've made some research and there is the results. I've created a library project with 1000 types and I've measured compilation time.

I've noticed small increasing of the compilation time, about 0.01 sec per type (or 7-30%). I think it is acceptable worth for the bigger runtime performance.

Test code and raw data

serde-perf.zip

Created library cargo project with following lib.rs content:

use serde::{Deserialize};
macro_rules! generate {
  ($(#[$counter:meta])*) => {
    $(
      const _: () = {
        #[$counter]
        #[derive(Deserialize)]
        #[serde(tag = "tag")]
        enum Node {
          Unit,
          Struct {
            name: String,
            list: Vec<Node>,
          },
          // Uncomment for "big enum" tests
          /*
          Newtype1(std::collections::HashMap<String, String>),
          Newtype2(String),
          Newtype3(u32),
          Newtype4(f32),
          Unit1,
          Unit2,
          Unit3,
          Unit4,
          Struct1 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct2 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct3 { f1: String, f2: u32, f3: bool, f4: f64 },
          Struct4 { f1: String, f2: u32, f3: bool, f4: f64 },// */
        }
      };
    )*
  };
}
// Expanded manually for "expand" tests
generate!(
  /// ...
  /// 1000 lines
  /// ...
);

Tests run with command

cargo +nightly build -Ztimings

Test PC

OS version: Windows_NT x64 10.0.18363 CPUs: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 1992)

Summary table

Small enum (2 variants), 1000 types

Derived both Serialize and Deserialize, types generated with generate! macro

Deserialize + Serialize master (c26101532509477132721a8c56b7024891aaf1b4) (100%) PR (a169beee0d84fa0cad022357c974d05c223e32a0) Diff
all 47.37 = (46.96 + 47.10 + 48.06)/3 58.67 = (59.27 + 59.17 + 57.56)/3 +11.29 (+29%)
codegen 3.28 = (46.96-43.95 + 47.10-43.92 + 48.06-44.42)/3 4.81 = (59.27-53.93 + 59.17-53.85 + 57.56-53.80)/3 +1.53 (+10%)

Derived only Deserialize, types generated with generate! macro

Deserialize master (c26101532509477132721a8c56b7024891aaf1b4) (100%) PR (a169beee0d84fa0cad022357c974d05c223e32a0) Diff
all 39.25 = (39.16 + 39.66 + 38.94)/3 50.08 = (49.70 + 51.62 + 48.93)/3 +10.83 (+28%)
codegen 3.62 = (39.16-35.31 + 39.66-35.28 + 38.94-36.31)/3 4.53 = (49.70-46.17 + 51.62-45.29 + 48.93-45.21)/3 +0.91 (+25%)

Derived only Deserialize, types written manually

Deserialize + expanded master (c26101532509477132721a8c56b7024891aaf1b4) (100%) PR (a169beee0d84fa0cad022357c974d05c223e32a0) Diff
all 38.76 = (38.38 + 39.67 + 38.23)/3 50.36 = (50.34 + 50.59 + 50.14)/3 +11.60 (+30%)
codegen 3.24 = (38.38-35.41 + 39.67-35.65 + 38.23-35.50)/3 5.67 = (50.34-44.34 + 50.59-44.74 + 50.14-45.02)/3 +2.42 (+75%)

Big enum (14 variants), 1000 types

Derived only Deserialize, types generated with generate! macro

Deserialize master (c26101532509477132721a8c56b7024891aaf1b4) (100%) PR (a169beee0d84fa0cad022357c974d05c223e32a0) Diff
all 241.04 = (236.94 + 239.35 + 246.84)/3 257.01 = (257.75 + 258.76 + 254.70)/3 +16.03 (+7%)
codegen 40.90 = (236.94-197.11 + 239.35-199.61 + 246.84-203.70)/3 46.63 = (257.75-211.25 + 258.76-210.17 + 254.70-209.91)/3 +5.72 (+14%)

Derived only Deserialize, types written manually

Deserialize + expanded master (c26101532509477132721a8c56b7024891aaf1b4) (100%) PR (a169beee0d84fa0cad022357c974d05c223e32a0) Diff
all 238.41 = (238.49 + 236.58 + 240.16)/3 254.86 = (258.93 + 254.75 + 250.90)/3 +16.45 (+7%)
codegen 39.70 = (238.49-198.46 + 236.58-199.66 + 240.16-198.02)/3 45.05 = (258.93-209.91 + 254.75-211.19 + 250.90-208.34)/3 +5.35 (+13%)

Mingun avatar Mar 06 '21 16:03 Mingun

@Mingun What about runtime performance improvements?

pickfire avatar Mar 26 '21 17:03 pickfire

I didn't measure it, maybe I should

Mingun avatar Mar 26 '21 18:03 Mingun

Would this also fix the incorrect error messages caused by the internal buffer? https://github.com/serde-rs/serde/issues/1621

jarredholman avatar Jul 26 '22 15:07 jarredholman

For the optimized case yes, it should

Mingun avatar Jul 26 '22 16:07 Mingun