msgpack-rust icon indicating copy to clipboard operation
msgpack-rust copied to clipboard

Reduce monomorphization output by using more specialized code paths for deserializers

Open Alexis211 opened this issue 10 months ago • 1 comments

This PR tries to reduce the size of deserializers generated by rmp-serde by avoiding matching against irrelevant markers, instead using a specialized match expression that is generated using a macro to only handle the relevant possibilities.

Filtered output of cargo llvm-lines on one of the biggest crates of Garage, before this change:

     3456 (0.3%, 45.3%)     72 (0.2%, 26.4%)  tokio::runtime::task::core::Core<T,S>::set_stage
     3461 (0.3%, 45.0%)     63 (0.2%, 26.1%)  <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
     3816 (0.3%, 44.7%)     72 (0.2%, 25.9%)  tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
     4509 (0.4%, 40.2%)      9 (0.0%, 22.1%)  garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
     4639 (0.4%, 39.8%)     63 (0.2%, 22.1%)  <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
     4644 (0.4%, 39.4%)     36 (0.1%, 21.9%)  tokio::runtime::scheduler::current_thread::Handle::spawn
--
     5826 (0.5%, 35.0%)    255 (0.8%, 17.1%)  core::result::Result<T,E>::map
     5916 (0.5%, 34.5%)    102 (0.3%, 16.3%)  <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
     6379 (0.5%, 34.0%)     94 (0.3%, 15.9%)  core::iter::traits::iterator::Iterator::try_fold
--
    14904 (1.3%, 24.3%)    432 (1.4%,  3.3%)  std::panic::catch_unwind
    30468 (2.6%, 23.1%)    196 (0.6%,  1.9%)  rmp_serde::decode::read_str_data
   103382 (8.8%, 20.5%)    197 (0.6%,  1.3%)  rmp_serde::decode::any_num
   138318 (11.7%, 11.7%)   196 (0.6%,  0.6%)  rmp_serde::decode::Deserializer<R,C>::any_inner
  1180351                31016                (TOTAL)
  -----                 ------               -------------
  Lines                 Copies               Function name

and after this change:

    3212 (0.4%, 33.3%)      1 (0.0%, 22.8%)  garage_api_admin::router_v2::<impl garage_api_admin::api::AdminApiRequest>::from_request::{{closure}}
    3216 (0.4%, 33.0%)     16 (0.1%, 22.8%)  <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::collect_seq
    3240 (0.4%, 32.6%)     72 (0.3%, 22.8%)  tokio::runtime::task::harness::Harness<T,S>::release
--
    3456 (0.4%, 31.2%)     72 (0.3%, 22.1%)  tokio::runtime::task::core::Core<T,S>::set_stage
    3461 (0.4%, 30.8%)     63 (0.2%, 21.9%)  <rmp_serde::decode::MapAccess<R,C> as serde::de::MapAccess>::next_key_seed
    3522 (0.4%, 30.4%)     15 (0.1%, 21.6%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_map
    3816 (0.4%, 30.0%)     72 (0.3%, 21.6%)  tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
--
    3908 (0.4%, 29.2%)     39 (0.1%, 20.7%)  alloc::vec::Vec<T,A>::extend_desugared
    3972 (0.4%, 28.8%)     12 (0.0%, 20.6%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_seq
    3999 (0.4%, 28.3%)     30 (0.1%, 20.5%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
--
    4509 (0.5%, 25.6%)      9 (0.0%, 19.3%)  garage_net::endpoint::Endpoint<M,H>::call_streaming::{{closure}}
    4639 (0.5%, 25.1%)     63 (0.2%, 19.3%)  <rmp_serde::decode::SeqAccess<R,C> as serde::de::SeqAccess>::next_element_seed
    4644 (0.5%, 24.6%)     36 (0.1%, 19.1%)  tokio::runtime::scheduler::current_thread::Handle::spawn
--
    5402 (0.6%, 21.9%)    235 (0.9%, 17.5%)  alloc::boxed::Box<T>::new
    5916 (0.6%, 21.3%)    102 (0.4%, 16.6%)  <&mut rmp_serde::encode::Serializer<W,C> as serde::ser::Serializer>::serialize_newtype_variant
    6212 (0.7%, 20.7%)    767 (2.9%, 16.2%)  <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
--
   11149 (1.2%, 11.6%)    362 (1.4%,  6.0%)  tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
   11539 (1.3%, 10.4%)     73 (0.3%,  4.7%)  rmp_serde::decode::read_str_data
   12427 (1.4%,  9.1%)    571 (2.2%,  4.4%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
   14904 (1.6%,  7.7%)    432 (1.6%,  2.2%)  std::panic::catch_unwind
   25430 (2.8%,  6.1%)     72 (0.3%,  0.6%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_identifier
   30260 (3.3%,  3.3%)     75 (0.3%,  0.3%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde::de::Deserializer>::deserialize_struct
  911552                26229                (TOTAL)
  -----                 ------               -------------
  Lines                 Copies               Function name

As you can see, this reduces the size of the LLVM IR by 268799 lines, or 22.7% of all code generated by this crate.

In this example, most of the serialization and deserialization routines correspond to (de)serializing structs defined in this file

Alexis211 avatar Feb 05 '25 17:02 Alexis211