tonic icon indicating copy to clipboard operation
tonic copied to clipboard

refactor: Reduce how much code gets monomorphized

Open Marwes opened this issue 3 years ago • 0 comments

Motivation

tonic has a lot of generic code which causes a lot of monormorphization and therefore bloated compile times and binaries.

Solution

By extracting code which does not need to be generic over a type parameter into non generic (or less generic functions) we can reduce how much code gets instantiated and passed to LLVM.

This gives a ~23% reduction in the output of LLVM IR (according to 'cargo llvm-lines -p examples --bin streaming-client) and reduces the compilation time of cargo build --release --bin streaming-client` by ~1.5s (21s to 19.5s). (Measured by just switching branches between this one and master to force recompilations)

Before

  Lines         Copies       Function name
  -----         ------       -------------
  Lines         Copies       Function name
  -----         ------       -------------
  45542 (100%)  1250 (100%)  (TOTAL)
   3588 (7.9%)     3 (0.2%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   3320 (7.3%)     3 (0.2%)  tonic::codec::encode::encode::{{closure}}
   1282 (2.8%)     1 (0.1%)  <tonic::codec::decode::Streaming<T> as futures_core::stream::Stream>::poll_next
   1035 (2.3%)    19 (1.5%)  core::result::Result<T,E>::map_err
    997 (2.2%)    21 (1.7%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.0%)    21 (1.7%)  core::pin::Pin<&mut T>::map_unchecked_mut
    872 (1.9%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    760 (1.7%)    95 (7.6%)  core::pin::Pin<P>::new_unchecked
    720 (1.6%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    710 (1.6%)     1 (0.1%)  tonic::codec::decode::Streaming<T>::decode_chunk
    621 (1.4%)     8 (0.6%)  std::thread::local::LocalKey<T>::try_with
    618 (1.4%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_trailers
    605 (1.3%)     1 (0.1%)  prost::encoding::decode_varint_slice
    566 (1.2%)    13 (1.0%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    555 (1.2%)     3 (0.2%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.0%)     7 (0.6%)  tonic::request::Request<T>::map
    430 (0.9%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (0.9%)     9 (0.7%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    417 (0.9%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    414 (0.9%)     9 (0.7%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<futures_util::stream::once::Once<futures_util::future::ready::Ready<streaming_client::pb::EchoRequest>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    373 (0.8%)     1 (0.1%)  core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<tokio_stream::stream_ext::throttle::Throttle<tokio_stream::stream_ext::map::Map<tokio_stream::iter::Iter<core::ops::range::Range<usize>>,streaming_client::echo_requests_iter::{{closure}}>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
    367 (0.8%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}

After

 Lines         Copies       Function name
  -----         ------       -------------
  35425 (100%)  1074 (100%)  (TOTAL)
   1587 (4.5%)     3 (0.3%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}
   1035 (2.9%)    19 (1.8%)  core::result::Result<T,E>::map_err
   1024 (2.9%)    21 (2.0%)  <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
    924 (2.6%)    21 (2.0%)  core::pin::Pin<&mut T>::map_unchecked_mut
    912 (2.6%)     6 (0.6%)  tonic::codec::encode::encode::{{closure}}::{{closure}}
    872 (2.5%)     2 (0.2%)  tonic::transport::channel::Channel::connect::{{closure}}
    720 (2.0%)     2 (0.2%)  streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
    688 (1.9%)    86 (8.0%)  core::pin::Pin<P>::new_unchecked
    605 (1.7%)     1 (0.1%)  prost::encoding::decode_varint_slice
    591 (1.7%)     3 (0.3%)  <futures_util::stream::try_stream::and_then::AndThen<St,Fut,F> as futures_core::stream::Stream>::poll_next
    586 (1.7%)    13 (1.2%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    582 (1.6%)     3 (0.3%)  <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
    455 (1.3%)     7 (0.7%)  tonic::request::Request<T>::map
    430 (1.2%)     1 (0.1%)  streaming_client::main::{{closure}}
    420 (1.2%)     9 (0.8%)  tonic::codec::encode::encode::{{closure}}::{{closure}}::{{closure}}
    417 (1.2%)     9 (0.8%)  tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
    417 (1.2%)     1 (0.1%)  tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
    391 (1.1%)     5 (0.5%)  std::thread::local::LocalKey<T>::try_with
    367 (1.0%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}
    347 (1.0%)     1 (0.1%)  tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}
    324 (0.9%)     2 (0.2%)  tonic::transport::service::connection::Connection::connect::{{closure}}
    320 (0.9%)     2 (0.2%)  tokio::park::thread::CachedParkThread::block_on
    314 (0.9%)     4 (0.4%)  tokio::coop::with_budget::{{closure}}
    312 (0.9%)     3 (0.3%)  tonic::codec::encode::encode
    310 (0.9%)     1 (0.1%)  streaming_client::streaming_echo::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::bidirectional_streaming_echo_throttle::{{closure}}
    306 (0.9%)     1 (0.1%)  streaming_client::pb::echo_client::EchoClient<tonic::transport::channel::Channel>::connect::{{closure}}

Closes #849 (supersedes it)

Marwes avatar Jul 08 '22 13:07 Marwes

Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.

Every combination of types that these functions gets called with gets a unique instantiation of binary code (monomorphized). So if we can shrink these generic functions, usually be moving code into less/non-generic functions rust/llvm can re-use those new functions for every generic function.

Marwes avatar Aug 10 '22 09:08 Marwes

Did a couple of more tweaks to re-use code when the output type is the same.

Marwes avatar Aug 10 '22 10:08 Marwes

With the new changes (and using a different/newer compiler) we go from 47011 to 33329 (-29%)

Marwes avatar Aug 10 '22 11:08 Marwes

Nice! Seems like CI is failing?

LucioFranco avatar Aug 19 '22 15:08 LucioFranco

Thanks!

LucioFranco avatar Aug 23 '22 18:08 LucioFranco