msgpack-rust
msgpack-rust copied to clipboard
Reduce monomorphization output by using more specialized code paths for deserializers
This PR tries to reduce the size of deserializers generated by rmp-serde by avoiding matching against irrelevant markers, instead using a specialized match expression that is generated using a macro to only handle the relevant possibilities.
Filtered output of cargo llvm-lines on one of the biggest crates of Garage, before this change:
3456 (0.3%, 45.3%) 72 (0.2%, 26.4%) tokio::runtime::task::core::Core<T,S>::set_stage
3461 (0.3%, 45.0%) 63 (0.2%, 26.1%) <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
3816 (0.3%, 44.7%) 72 (0.2%, 25.9%) tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
4509 (0.4%, 40.2%) 9 (0.0%, 22.1%) garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
4639 (0.4%, 39.8%) 63 (0.2%, 22.1%) <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
4644 (0.4%, 39.4%) 36 (0.1%, 21.9%) tokio::runtime::scheduler::current_thread::Handle::spawn
--
5826 (0.5%, 35.0%) 255 (0.8%, 17.1%) core::result::Result<T,E>::map
5916 (0.5%, 34.5%) 102 (0.3%, 16.3%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
6379 (0.5%, 34.0%) 94 (0.3%, 15.9%) core::iter::traits::iterator::Iterator::try_fold
--
14904 (1.3%, 24.3%) 432 (1.4%, 3.3%) std::panic::catch_unwind
30468 (2.6%, 23.1%) 196 (0.6%, 1.9%) rmp_serde::decode::read_str_data
103382 (8.8%, 20.5%) 197 (0.6%, 1.3%) rmp_serde::decode::any_num
138318 (11.7%, 11.7%) 196 (0.6%, 0.6%) rmp_serde::decode::Deserializer<R,C>::any_inner
1180351 31016 (TOTAL)
----- ------ -------------
Lines Copies Function name
and after this change:
3212 (0.4%, 33.3%) 1 (0.0%, 22.8%) garage_api_admin::router_v2::<impl garage_api_admin::api::AdminApiRequest>::from_request::{{closure}}
3216 (0.4%, 33.0%) 16 (0.1%, 22.8%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::collect_seq
3240 (0.4%, 32.6%) 72 (0.3%, 22.8%) tokio::runtime::task::harness::Harness<T,S>::release
--
3456 (0.4%, 31.2%) 72 (0.3%, 22.1%) tokio::runtime::task::core::Core<T,S>::set_stage
3461 (0.4%, 30.8%) 63 (0.2%, 21.9%) <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
3522 (0.4%, 30.4%) 15 (0.1%, 21.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_map
3816 (0.4%, 30.0%) 72 (0.3%, 21.6%) tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
3908 (0.4%, 29.2%) 39 (0.1%, 20.7%) alloc::vec::Vec<T,A>::extend_desugared
3972 (0.4%, 28.8%) 12 (0.0%, 20.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_seq
3999 (0.4%, 28.3%) 30 (0.1%, 20.5%) <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
--
4509 (0.5%, 25.6%) 9 (0.0%, 19.3%) garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
4639 (0.5%, 25.1%) 63 (0.2%, 19.3%) <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
4644 (0.5%, 24.6%) 36 (0.1%, 19.1%) tokio::runtime::scheduler::current_thread::Handle::spawn
--
5402 (0.6%, 21.9%) 235 (0.9%, 17.5%) alloc::boxed::Box<T>::new
5916 (0.6%, 21.3%) 102 (0.4%, 16.6%) <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
6212 (0.7%, 20.7%) 767 (2.9%, 16.2%) <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
--
11149 (1.2%, 11.6%) 362 (1.4%, 6.0%) tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
11539 (1.3%, 10.4%) 73 (0.3%, 4.7%) rmp_serde::decode::read_str_data
12427 (1.4%, 9.1%) 571 (2.2%, 4.4%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
14904 (1.6%, 7.7%) 432 (1.6%, 2.2%) std::panic::catch_unwind
25430 (2.8%, 6.1%) 72 (0.3%, 0.6%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_identifier
30260 (3.3%, 3.3%) 75 (0.3%, 0.3%) <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_struct
911552 26229 (TOTAL)
----- ------ -------------
Lines Copies Function name
As you can see, this reduces the size of the LLVM IR by 268799 lines, or 22.7% of all code generated by this crate.
In this example, most of the serialization and deserialization routines correspond to (de)serializing structs defined in this file