serde Shrink derive output size

Shrink derive output size

Open nnethercote opened this issue 1 year ago • 10 comments

This PR has a series of commits that shrink the size of the code generated for derive(Serialize, Deserialize), particularly for large structs.

Here are some instruction count results for some synthetic benchmarks.

Benchmark	Profile	Scenario	% Change	Significance Factor?
big300	check	full	-31.80%	159.00x
big300	debug	full	-30.15%	150.75x
big300	opt	full	-29.99%	149.96x
big200	check	full	-29.44%	147.19x
big200	debug	full	-26.94%	134.69x
big200	opt	full	-26.83%	134.15x
big100	check	full	-25.50%	127.48x
big100	debug	full	-21.02%	105.11x
big100	opt	full	-20.73%	103.64x
derive-serde	check	full	-6.50%	32.50x
derive-serde	opt	full	-3.97%	19.85x
derive-serde	debug	full	-3.76%	18.82x

big{1,2,3}00 each contain a single struct with {100,200,300} u32 fields that is annotated with derive(Serialize, Deserialize).

derive-serde contains a variety of structs and enums. It is identical to https://github.com/rust-lang/rust/blob/master/src/test/ui/deriving/deriving-all-codegen.rs except with the builtin trait derives replaced with derive(Serialize, Deserialize).

Jul 18 '22 07:07 nnethercote

What the table headers mean?

Jul 18 '22 09:07 Mingun

The headings are those used in https://perf.rust-lang.org/compare.html. The important one is the "% Change".

Jul 18 '22 10:07 nnethercote

Any thoughts on whether this change is a good one? There are 32 positive emoji reactions on the description. I'm happy to make adjustments (e.g. moving that method to a free function), but it's not yet clear to me if this change is wanted by the serde developers. Thanks!

Jul 26 '22 00:07 nnethercote

Summary: the wins in terms of size reduction in a "real world" project that I tested are unfortunately very small.

I was very happy to see this PR as I recently found out that generated serde code makes up a big part of the final binary of a project I'm working on as part of my day job. It's a web application with Rust backend that performs various tasks and uses serde derive for lots of different stuff. (It's even open source, if anyone's interested: link). So I thought it might be a valuable data point to see how this PR performs with a "production project".

I know that this PR is mainly motivated by compile times, but I thought checking it for binary output size is interesting too.

Unfortunately, the results were not as great as I has hoped for. I tested serde 1.0.139 vs. this branch (f8cc37c8), using [patch]. I performed a release build with these adjustments:

[profile.release]
debug = 1
codegen-units = 1
lto = "thin"

[profile.release.package."*"]
debug = false

We need basic debug info to better debug errors in production. And it allows me to check per symbol size! Also, some custom --remap-path-prefix flags are set and the resulting binary is compressed with objcopy --compress-debug-sections.

But first, the final file sizes (the -stripped files had strip applied to them):

23602168  smaller-serde
23595960  with-1.0.139
14708592  smaller-serde-stripped
14720880  with-1.0.139-stripped

With this patch, the binary including debug info is even larger than with 1.0.139. That was unexpected! At least the stripped one shows a small win. I tried only checking the size of serde functions. I did that via this command chain:

llvm-nm-10 --print-size --size-sort --radix=d <binary> 
    | rg -F <filter>
    | cut -d ' ' -f 2 
    | awk '{s+=$1} END {print s}'

Here is the result matrix for the two variables <filter> and <binary>. Note that my llvm-nm-10 did not demangle the symbols, that's the reason for the .. there.

Filter:	`serde`	`serde..de..Deserialize`	`serde..ser..Serialize`
`with-1.0.139`	809,524	525,434	56,330
`smaller-serde`	796,166	510,409	56,330

That's a roughly 13KiB reduction = 98.3% of the original size. A win is a win, but I had hoped for more and I feel like there are a lot of size optimization that could still be done.

I also looked at the biggest functions to see what changed there.

20 largest serde functions

with-1.0.139:

 6834 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 6936 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 7143 <tobira::search::confique_partial_meili_config::_::<impl serde::de::Deserialize for tobira::search::confique_partial_meili_config::PartialMeiliConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 7536 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 7595 <meilisearch_sdk::errors::_::<impl serde::de::Deserialize for meilisearch_sdk::errors::ErrorCode>::deserialize::__FieldVisitor as serde::de::Visitor>::visit_bytes
 7771 <tobira::config::theme::confique_partial_theme_config::_::<impl serde::de::Deserialize for tobira::config::theme::confique_partial_theme_config::PartialThemeConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 7977 <toml::de::MapVisitor as serde::de::Deserializer>::deserialize_any
 8026 <tobira::config::general::confique_partial_general_config::_::<impl serde::de::Deserialize for tobira::config::general::confique_partial_general_config::PartialGeneralConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
10939 <serde::de::impls::OptionVisitor<T> as serde::de::Visitor>::visit_some
11060 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
13154 <tobira::auth::confique_partial_auth_config::_::<impl serde::de::Deserialize for tobira::auth::confique_partial_auth_config::PartialAuthConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
13461 <tobira::search::event::_::<impl serde::de::Deserialize for tobira::search::event::Event>::deserialize::__Visitor as serde::de::Visitor>::visit_map
15906 <toml::de::ValueDeserializer as serde::de::Deserializer>::deserialize_struct
17591 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::TaskType>::deserialize
19177 tobira::sync::harvest::response::_::<impl serde::de::Deserialize for tobira::sync::harvest::response::HarvestItem>::deserialize
20449 <toml::de::MapVisitor as serde::de::MapAccess>::next_value_seed
22081 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::Task>::deserialize
24697 <serde::__private::de::content::ContentDeserializer<E> as serde::de::Deserializer>::deserialize_any
26080 <juniper::http::GraphQLResponse<T> as serde::ser::Serialize>::serialize
95307 <tobira::config::confique_partial_config::_::<impl serde::de::Deserialize for tobira::config::confique_partial_config::PartialConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map

smaller-serde

 6605 <tobira::sync::confique_partial_sync_config::_::<impl serde::de::Deserialize for tobira::sync::confique_partial_sync_config::PartialSyncConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 6607 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 7143 <tobira::search::confique_partial_meili_config::_::<impl serde::de::Deserialize for tobira::search::confique_partial_meili_config::PartialMeiliConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 7212 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 7444 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 7595 <meilisearch_sdk::errors::_::<impl serde::de::Deserialize for meilisearch_sdk::errors::ErrorCode>::deserialize::__FieldVisitor as serde::de::Visitor>::visit_bytes
 7744 <tobira::config::theme::confique_partial_theme_config::_::<impl serde::de::Deserialize for tobira::config::theme::confique_partial_theme_config::PartialThemeConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 7937 <tobira::config::general::confique_partial_general_config::_::<impl serde::de::Deserialize for tobira::config::general::confique_partial_general_config::PartialGeneralConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 7977 <toml::de::MapVisitor as serde::de::Deserializer>::deserialize_any
10923 <serde::de::impls::OptionVisitor<T> as serde::de::Visitor>::visit_some
11031 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
13239 <tobira::auth::confique_partial_auth_config::_::<impl serde::de::Deserialize for tobira::auth::confique_partial_auth_config::PartialAuthConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map
13324 <tobira::search::event::_::<impl serde::de::Deserialize for tobira::search::event::Event>::deserialize::__Visitor as serde::de::Visitor>::visit_map
15929 <toml::de::ValueDeserializer as serde::de::Deserializer>::deserialize_struct
17396 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::TaskType>::deserialize
20435 <toml::de::MapVisitor as serde::de::MapAccess>::next_value_seed
21953 meilisearch_sdk::tasks::_::<impl serde::de::Deserialize for meilisearch_sdk::tasks::Task>::deserialize
26080 <juniper::http::GraphQLResponse<T> as serde::ser::Serialize>::serialize
32282 tobira::sync::harvest::response::_::<impl serde::de::Deserialize for tobira::sync::harvest::response::HarvestItem>::deserialize
93552 <tobira::config::confique_partial_config::_::<impl serde::de::Deserialize for tobira::config::confique_partial_config::PartialConfig>::deserialize::__Visitor as serde::de::Visitor>::visit_map

A few things:

The tobira::sync::harvest.... function is significantly larger with this patch (19177 -> 32282). The struct definition is here.
I'm not sure why there are several PhantomData as DeserializeSeed impls in there.
Some functions stayed exactly the same size.
The largest function is quite a lot larger than everything else, so it might be worth checking that out specifically. But I don't have time to dig into that now. I can roughly say that it's a a struct with 9 fields of type Option<X> where X is different per field and is always another struct with several Option fields. If anyone is interested in digging into that, I'm happy to help (as the confique thingy is by me).

Ok, I hope this investigation is useful at all. As someone with 800KiB of serde functions in my release artifact, I'm very interested in improvements in this area ^_^

Jul 26 '22 07:07 LukasKalbertodt

Thank you Lukas, your writeup is enormously helpful in evaluating this PR. One thing I would want to look into is whether any of the "small serde" redesigns (miniserde, deser, …) are viable for your use case, or even do at all better on code size.

In response to @nnethercote's last comment: it is not unwanted, but it is unsolicited, and an unsolicited PR (i.e. not in response to a "help wanted" label or an issue discussion which I participate in) is always less likely to appear at the top of my priorities. The way to get a PR reviewed is by being the most impactful thing for me to dedicate time to according to my opinion, and my sense has been this is not that, since there are multiple other serde PRs that have been queued longer and which I would rather land over this. Recently though my biggest chunk of time is going to a major serde_yaml revamp since that crate has been neglected much worse and longer than this crate.

You called out how impressive the emoji reactions to this PR are, but that is not a signal I look at in deciding where to invest time. It is predominantly a measure of social media reach over anything else, and prioritizing PRs from whoever has the highest twitter followers, rather than whichever is the most impactful contribution to the project, is not a productive direction I think.

Jul 26 '22 10:07 dtolnay

I made some tests with rust-analyzer (it mostly uses serde via lsp-types) and the results are a mixed bag, too:

# run cargo clean, then cargo build --release

# v1.0.139, incremental = true, opt-level = 3
   text	   data	    bss	    dec	    hex	filename
25289491	1451288	   3784	26744563	19816f3	target/release/rust-analyzer

# nn, incremental = true, opt-level = 3
   text	   data	    bss	    dec	    hex	filename
25296787	1448360	   3784	26748931	1982803	target/release/rust-analyzer

# v1.0.139, incremental = true, opt-level = "s"
   text	   data	    bss	    dec	    hex	filename
22104455	1663648	   3784	23771887	16abaef	target/release/rust-analyzer

# nn, incremental = true, opt-level = "s"
   text	   data	    bss	    dec	    hex	filename
22115079	1660808	   3784	23779671	16ad957	target/release/rust-analyzer

Notice how the text increase is somewhat offset by data, but only in the opt-level = 3 case. I haven't specifically checked the size of the serde functions, but they add up to 1 144 089 bytes in the third case above.

Still, I think this is worth pursuing. serde is in the top 10 of most downloaded crates, so any compile time or code size wins will add up across the ecosystem. I'm pretty sure that nobody has ever said "I love serde because it compiles so quickly", but I've seen a lot of people actively avoiding it, or other proc macro crates.

And I'd like to thank nnethercote for putting in the effort to optimize not only the Rust compiler, but also many of the popular third-party crate.

Jul 26 '22 11:07 lnicola

@LukasKalbertodt, @lnicola: thanks for the data. Your results are expected, though I see now this wasn't clear from my original description. The change I made shrinks the size of the Rust code that the front end has to process, but the new functions are marked inline, so by the time inlining happens (either at the MIR level or at the LLVM IR level) the final code will be very similar to the current. Which means this PR will help compile times but have little effect on binary size. If you removed the inline annotations from the new functions things might be different, but ser/deser speed might also be slower.

For binary size, my measurements show that Deserialize::visit_map is easily the biggest function. E.g for the stress test of a struct with 100 u32 fields (which is big but within the realms of possibility) the generated visit_map is 2833 lines of code before this PR, and 2133 lines of code after this PR.

AFAICT, the generated visit_map is able to handle struct fields being present in any order, even though the generated Serialize::serialize always writes the fields in declaration order. If visit_map could assume the fields were in declaration order it could be shorter.

Jul 26 '22 23:07 nnethercote

AFAICT, the generated visit_map is able to handle struct fields being present in any order, even though the generated Serialize::serialize always writes the fields in declaration order. If visit_map could assume the fields were in declaration order it could be shorter.

AFAIK, the visitor is driven by the deserializer which can mean that it is driven by the order of the fields in the wire format. Hence it needs to be able to handle any order to be able to deserialize the output of systems other than Serde.

Jul 27 '22 05:07 adamreichold

From the other hand a new struct attribute could be added, for example, #[serde(strict_order)] which would allow only the order that is defined in the struct

Jul 27 '22 06:07 Mingun

Something that did likely also help would be an attribute to only allow structs and enums with named fields and variants (for json, toml, ...) or structs and enums with index based fields and variants (for bincode, ...), while disallowing the other for the type it is applied to. In the common case where you only need to serialize/deserialize a single format, that would cut the size of the derived Deserialize impl in half I think.

Jul 27 '22 08:07 bjorn3

There should be one single non-generic DeserializeSeed type that is used for deserializing all fields of all data structures, because none of that logic is different depending on whose fields they are.

Something like this?

Aug 09 '23 13:08 Mingun

serde serde copied to clipboard

Shrink derive output size

serde
serde copied to clipboard