serde
serde copied to clipboard
Shrink derive output size
This PR has a series of commits that shrink the size of the code generated for derive(Serialize, Deserialize)
, particularly for large structs.
Here are some instruction count results for some synthetic benchmarks.
Benchmark | Profile | Scenario | % Change | Significance Factor? |
---|---|---|---|---|
big300 | check | full | -31.80% | 159.00x |
big300 | debug | full | -30.15% | 150.75x |
big300 | opt | full | -29.99% | 149.96x |
big200 | check | full | -29.44% | 147.19x |
big200 | debug | full | -26.94% | 134.69x |
big200 | opt | full | -26.83% | 134.15x |
big100 | check | full | -25.50% | 127.48x |
big100 | debug | full | -21.02% | 105.11x |
big100 | opt | full | -20.73% | 103.64x |
derive-serde | check | full | -6.50% | 32.50x |
derive-serde | opt | full | -3.97% | 19.85x |
derive-serde | debug | full | -3.76% | 18.82x |
big{1,2,3}00
each contain a single struct with {100,200,300} u32
fields that is annotated with derive(Serialize, Deserialize)
.
derive-serde
contains a variety of structs and enums. It is identical to https://github.com/rust-lang/rust/blob/master/src/test/ui/deriving/deriving-all-codegen.rs except with the builtin trait derives replaced with derive(Serialize, Deserialize)
.
What the table headers mean?
The headings are those used in https://perf.rust-lang.org/compare.html. The important one is the "% Change".
Any thoughts on whether this change is a good one? There are 32 positive emoji reactions on the description. I'm happy to make adjustments (e.g. moving that method to a free function), but it's not yet clear to me if this change is wanted by the serde developers. Thanks!
Summary: the wins in terms of size reduction in a "real world" project that I tested are unfortunately very small.
I was very happy to see this PR as I recently found out that generated serde code makes up a big part of the final binary of a project I'm working on as part of my day job. It's a web application with Rust backend that performs various tasks and uses serde derive for lots of different stuff. (It's even open source, if anyone's interested: link). So I thought it might be a valuable data point to see how this PR performs with a "production project".
I know that this PR is mainly motivated by compile times, but I thought checking it for binary output size is interesting too.
Unfortunately, the results were not as great as I has hoped for. I tested serde 1.0.139 vs. this branch (f8cc37c8), using [patch]
. I performed a release build with these adjustments:
[profile.release]
debug = 1
codegen-units = 1
lto = "thin"
[profile.release.package."*"]
debug = false
We need basic debug info to better debug errors in production. And it allows me to check per symbol size! Also, some custom --remap-path-prefix
flags are set and the resulting binary is compressed with objcopy --compress-debug-sections
.
But first, the final file sizes (the -stripped
files had strip
applied to them):
23602168 smaller-serde
23595960 with-1.0.139
14708592 smaller-serde-stripped
14720880 with-1.0.139-stripped
With this patch, the binary including debug info is even larger than with 1.0.139. That was unexpected! At least the stripped one shows a small win. I tried only checking the size of serde functions. I did that via this command chain:
llvm-nm-10 --print-size --size-sort --radix=d <binary>
| rg -F <filter>
| cut -d ' ' -f 2
| awk '{s+=$1} END {print s}'
Here is the result matrix for the two variables <filter>
and <binary>
. Note that my llvm-nm-10 did not demangle the symbols, that's the reason for the ..
there.
Filter: | serde |
serde..de..Deserialize |
serde..ser..Serialize |
---|---|---|---|
with-1.0.139 |
809,524 | 525,434 | 56,330 |
smaller-serde |
796,166 | 510,409 | 56,330 |
That's a roughly 13KiB reduction = 98.3% of the original size. A win is a win, but I had hoped for more and I feel like there are a lot of size optimization that could still be done.
I also looked at the biggest functions to see what changed there.
20 largest serde functions
with-1.0.139
:
6834 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
6936 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
7143 <tobira::search::confique_partial_meili_config::_::<impl serde::de::Deserialize for tobira::search::confique_partial_meili_config::PartialMeiliConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
7536 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
7595 <meilisearch_sdk::errors::_::<impl serde::de::Deserialize for meilisearch_sdk::errors::ErrorCode>::deserialize::__FieldVisitor as serde::de::Visitor>::visit_bytes
7771 <tobira::config::theme::confique_partial_theme_config::_::<impl serde::de::Deserialize for tobira::config::theme::confique_partial_theme_config::PartialThemeConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
7977 <toml::de::MapVisitor as serde::de::Deserializer>::deserialize_any
8026 <tobira::config::general::confique_partial_general_config::_::<impl serde::de::Deserialize for tobira::config::general::confique_partial_general_config::PartialGeneralConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
10939 <serde::de::impls::OptionVisitor<T> as serde::de::Visitor>::visit_some
11060 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
13154 <tobira::auth::confique_partial_auth_config::_::<impl serde::de::Deserialize for tobira::auth::confique_partial_auth_config::PartialAuthConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
13461 <tobira::search::event::_::<impl serde::de::Deserialize for tobira::search::event::Event>::deserialize::__Visitor as serde::de::Visitor>::visit_map
15906 <toml::de::ValueDeserializer as serde::de::Deserializer>::deserialize_struct
17591 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::TaskType>::deserialize
19177 tobira::sync::harvest::response::_::<impl serde::de::Deserialize for tobira::sync::harvest::response::HarvestItem>::deserialize
20449 <toml::de::MapVisitor as serde::de::MapAccess>::next_value_seed
22081 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::Task>::deserialize
24697 <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_any
26080 <juniper::http::GraphQLResponse<T> as serde::ser::Serialize>::serialize
95307 <tobira::config::confique_partial_config::_::<impl serde::de::Deserialize for tobira::config::confique_partial_config::PartialConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
smaller-serde
6605 <tobira::sync::confique_partial_sync_config::_::<impl serde::de::Deserialize for tobira::sync::confique_partial_sync_config::PartialSyncConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
6607 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
7143 <tobira::search::confique_partial_meili_config::_::<impl serde::de::Deserialize for tobira::search::confique_partial_meili_config::PartialMeiliConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
7212 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
7444 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
7595 <meilisearch_sdk::errors::_::<impl serde::de::Deserialize for meilisearch_sdk::errors::ErrorCode>::deserialize::__FieldVisitor as serde::de::Visitor>::visit_bytes
7744 <tobira::config::theme::confique_partial_theme_config::_::<impl serde::de::Deserialize for tobira::config::theme::confique_partial_theme_config::PartialThemeConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
7937 <tobira::config::general::confique_partial_general_config::_::<impl serde::de::Deserialize for tobira::config::general::confique_partial_general_config::PartialGeneralConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
7977 <toml::de::MapVisitor as serde::de::Deserializer>::deserialize_any
10923 <serde::de::impls::OptionVisitor<T> as serde::de::Visitor>::visit_some
11031 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
13239 <tobira::auth::confique_partial_auth_config::_::<impl serde::de::Deserialize for tobira::auth::confique_partial_auth_config::PartialAuthConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
13324 <tobira::search::event::_::<impl serde::de::Deserialize for tobira::search::event::Event>::deserialize::__Visitor as serde::de::Visitor>::visit_map
15929 <toml::de::ValueDeserializer as serde::de::Deserializer>::deserialize_struct
17396 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::TaskType>::deserialize
20435 <toml::de::MapVisitor as serde::de::MapAccess>::next_value_seed
21953 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::Task>::deserialize
26080 <juniper::http::GraphQLResponse<T> as serde::ser::Serialize>::serialize
32282 tobira::sync::harvest::response::_::<impl serde::de::Deserialize for tobira::sync::harvest::response::HarvestItem>::deserialize
93552 <tobira::config::confique_partial_config::_::<impl serde::de::Deserialize for tobira::config::confique_partial_config::PartialConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
A few things:
- The
tobira::sync::harvest....
function is significantly larger with this patch (19177 -> 32282). The struct definition is here. - I'm not sure why there are several
PhantomData as DeserializeSeed
impls in there. - Some functions stayed exactly the same size.
- The largest function is quite a lot larger than everything else, so it might be worth checking that out specifically. But I don't have time to dig into that now. I can roughly say that it's a a struct with 9 fields of type
Option<X>
whereX
is different per field and is always another struct with severalOption
fields. If anyone is interested in digging into that, I'm happy to help (as the confique thingy is by me).
Ok, I hope this investigation is useful at all. As someone with 800KiB of serde functions in my release artifact, I'm very interested in improvements in this area ^_^
Thank you Lukas, your writeup is enormously helpful in evaluating this PR. One thing I would want to look into is whether any of the "small serde" redesigns (miniserde, deser, …) are viable for your use case, or even do at all better on code size.
In response to @nnethercote's last comment: it is not unwanted, but it is unsolicited, and an unsolicited PR (i.e. not in response to a "help wanted" label or an issue discussion which I participate in) is always less likely to appear at the top of my priorities. The way to get a PR reviewed is by being the most impactful thing for me to dedicate time to according to my opinion, and my sense has been this is not that, since there are multiple other serde PRs that have been queued longer and which I would rather land over this. Recently though my biggest chunk of time is going to a major serde_yaml revamp since that crate has been neglected much worse and longer than this crate.
You called out how impressive the emoji reactions to this PR are, but that is not a signal I look at in deciding where to invest time. It is predominantly a measure of social media reach over anything else, and prioritizing PRs from whoever has the highest twitter followers, rather than whichever is the most impactful contribution to the project, is not a productive direction I think.
I made some tests with rust-analyzer
(it mostly uses serde
via lsp-types
) and the results are a mixed bag, too:
# run cargo clean, then cargo build --release
# v1.0.139, incremental = true, opt-level = 3
text data bss dec hex filename
25289491 1451288 3784 26744563 19816f3 target/release/rust-analyzer
# nn, incremental = true, opt-level = 3
text data bss dec hex filename
25296787 1448360 3784 26748931 1982803 target/release/rust-analyzer
# v1.0.139, incremental = true, opt-level = "s"
text data bss dec hex filename
22104455 1663648 3784 23771887 16abaef target/release/rust-analyzer
# nn, incremental = true, opt-level = "s"
text data bss dec hex filename
22115079 1660808 3784 23779671 16ad957 target/release/rust-analyzer
Notice how the text
increase is somewhat offset by data
, but only in the opt-level = 3
case. I haven't specifically checked the size of the serde
functions, but they add up to 1 144 089
bytes in the third case above.
Still, I think this is worth pursuing. serde
is in the top 10 of most downloaded crates, so any compile time or code size wins will add up across the ecosystem. I'm pretty sure that nobody has ever said "I love serde
because it compiles so quickly", but I've seen a lot of people actively avoiding it, or other proc macro crates.
And I'd like to thank nnethercote for putting in the effort to optimize not only the Rust compiler, but also many of the popular third-party crate.
@LukasKalbertodt, @lnicola: thanks for the data. Your results are expected, though I see now this wasn't clear from my original description. The change I made shrinks the size of the Rust code that the front end has to process, but the new functions are marked inline
, so by the time inlining happens (either at the MIR level or at the LLVM IR level) the final code will be very similar to the current. Which means this PR will help compile times but have little effect on binary size. If you removed the inline
annotations from the new functions things might be different, but ser/deser speed might also be slower.
For binary size, my measurements show that Deserialize::visit_map
is easily the biggest function. E.g for the stress test of a struct with 100 u32
fields (which is big but within the realms of possibility) the generated visit_map
is 2833 lines of code before this PR, and 2133 lines of code after this PR.
AFAICT, the generated visit_map
is able to handle struct fields being present in any order, even though the generated Serialize::serialize
always writes the fields in declaration order. If visit_map
could assume the fields were in declaration order it could be shorter.
AFAICT, the generated visit_map is able to handle struct fields being present in any order, even though the generated Serialize::serialize always writes the fields in declaration order. If visit_map could assume the fields were in declaration order it could be shorter.
AFAIK, the visitor is driven by the deserializer which can mean that it is driven by the order of the fields in the wire format. Hence it needs to be able to handle any order to be able to deserialize the output of systems other than Serde.
From the other hand a new struct attribute could be added, for example, #[serde(strict_order)]
which would allow only the order that is defined in the struct
Something that did likely also help would be an attribute to only allow structs and enums with named fields and variants (for json, toml, ...) or structs and enums with index based fields and variants (for bincode, ...), while disallowing the other for the type it is applied to. In the common case where you only need to serialize/deserialize a single format, that would cut the size of the derived Deserialize impl in half I think.
There should be one single non-generic
DeserializeSeed
type that is used for deserializing all fields of all data structures, because none of that logic is different depending on whose fields they are.
Something like this?