refactor: Reduce how much code gets monomorphized
Motivation
tonic has a lot of generic code which causes a lot of monormorphization and therefore bloated compile times and binaries.
Solution
By extracting code which does not need to be generic over a type parameter into non generic (or less generic functions) we can reduce how much code gets instantiated and passed to LLVM.
This gives a ~23% reduction in the output of LLVM IR (according to 'cargo llvm-lines -p examples --bin streaming-client) and reduces the compilation time of cargo build --release --bin streaming-client` by ~1.5s (21s to 19.5s). (Measured by just switching branches between this one and master to force recompilations)
Before
Lines Copies Function name
----- ------ -------------
Lines Copies Function name
----- ------ -------------
45542 (100%) 1250 (100%) (TOTAL)
3588 (7.9%) 3 (0.2%) tonic::client::grpc::Grpc<T>::streaming::{{closure}}
3320 (7.3%) 3 (0.2%) tonic::codec::encode::encode::{{closure}}
1282 (2.8%) 1 (0.1%) <tonic::codec::decode::Streaming<T> as futures_core::stream::Stream>::poll_next
1035 (2.3%) 19 (1.5%) core::result::Result<T,E>::map_err
997 (2.2%) 21 (1.7%) <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
924 (2.0%) 21 (1.7%) core::pin::Pin<&mut T>::map_unchecked_mut
872 (1.9%) 2 (0.2%) tonic::transport::channel::Channel::connect::{{closure}}
760 (1.7%) 95 (7.6%) core::pin::Pin<P>::new_unchecked
720 (1.6%) 2 (0.2%) streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
710 (1.6%) 1 (0.1%) tonic::codec::decode::Streaming<T>::decode_chunk
621 (1.4%) 8 (0.6%) std::thread::local::LocalKey<T>::try_with
618 (1.4%) 3 (0.2%) <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_trailers
605 (1.3%) 1 (0.1%) prost::encoding::decode_varint_slice
566 (1.2%) 13 (1.0%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
555 (1.2%) 3 (0.2%) <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
455 (1.0%) 7 (0.6%) tonic::request::Request<T>::map
430 (0.9%) 1 (0.1%) streaming_client::main::{{closure}}
420 (0.9%) 9 (0.7%) tonic::codec::encode::encode::{{closure}}::{{closure}}
417 (0.9%) 1 (0.1%) tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
414 (0.9%) 9 (0.7%) tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
373 (0.8%) 1 (0.1%) core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<futures_util::stream::once::Once<futures_util::future::ready::Ready<streaming_client::pb::EchoRequest>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
373 (0.8%) 1 (0.1%) core::ptr::drop_in_place<tonic::codec::encode::encode<tonic::codec::prost::ProstEncoder<streaming_client::pb::EchoRequest>,futures_util::stream::stream::map::Map<tokio_stream::stream_ext::throttle::Throttle<tokio_stream::stream_ext::map::Map<tokio_stream::iter::Iter<core::ops::range::Range<usize>>,streaming_client::echo_requests_iter::{{closure}}>>,core::result::Result<streaming_client::pb::EchoRequest,tonic::status::Status>::Ok>>::{{closure}}>
367 (0.8%) 1 (0.1%) streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}
After
Lines Copies Function name
----- ------ -------------
35425 (100%) 1074 (100%) (TOTAL)
1587 (4.5%) 3 (0.3%) tonic::client::grpc::Grpc<T>::streaming::{{closure}}
1035 (2.9%) 19 (1.8%) core::result::Result<T,E>::map_err
1024 (2.9%) 21 (2.0%) <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
924 (2.6%) 21 (2.0%) core::pin::Pin<&mut T>::map_unchecked_mut
912 (2.6%) 6 (0.6%) tonic::codec::encode::encode::{{closure}}::{{closure}}
872 (2.5%) 2 (0.2%) tonic::transport::channel::Channel::connect::{{closure}}
720 (2.0%) 2 (0.2%) streaming_client::pb::echo_client::EchoClient<T>::bidirectional_streaming_echo::{{closure}}
688 (1.9%) 86 (8.0%) core::pin::Pin<P>::new_unchecked
605 (1.7%) 1 (0.1%) prost::encoding::decode_varint_slice
591 (1.7%) 3 (0.3%) <futures_util::stream::try_stream::and_then::AndThen<St,Fut,F> as futures_core::stream::Stream>::poll_next
586 (1.7%) 13 (1.2%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
582 (1.6%) 3 (0.3%) <tonic::codec::encode::EncodeBody<S> as http_body::Body>::poll_data
455 (1.3%) 7 (0.7%) tonic::request::Request<T>::map
430 (1.2%) 1 (0.1%) streaming_client::main::{{closure}}
420 (1.2%) 9 (0.8%) tonic::codec::encode::encode::{{closure}}::{{closure}}::{{closure}}
417 (1.2%) 9 (0.8%) tonic::client::grpc::Grpc<T>::streaming::{{closure}}::{{closure}}
417 (1.2%) 1 (0.1%) tonic::transport::channel::endpoint::Endpoint::connect::{{closure}}
391 (1.1%) 5 (0.5%) std::thread::local::LocalKey<T>::try_with
367 (1.0%) 1 (0.1%) streaming_client::pb::echo_client::EchoClient<T>::server_streaming_echo::{{closure}}
347 (1.0%) 1 (0.1%) tokio::runtime::basic_scheduler::CoreGuard::block_on::{{closure}}
324 (0.9%) 2 (0.2%) tonic::transport::service::connection::Connection::connect::{{closure}}
320 (0.9%) 2 (0.2%) tokio::park::thread::CachedParkThread::block_on
314 (0.9%) 4 (0.4%) tokio::coop::with_budget::{{closure}}
312 (0.9%) 3 (0.3%) tonic::codec::encode::encode
310 (0.9%) 1 (0.1%) streaming_client::streaming_echo::{{closure}}
306 (0.9%) 1 (0.1%) streaming_client::bidirectional_streaming_echo_throttle::{{closure}}
306 (0.9%) 1 (0.1%) streaming_client::pb::echo_client::EchoClient<tonic::transport::channel::Channel>::connect::{{closure}}
Closes #849 (supersedes it)
Thanks for taking this on! It would be interesting to know why these changes reduce the llvm ir bloat.
Every combination of types that these functions gets called with gets a unique instantiation of binary code (monomorphized). So if we can shrink these generic functions, usually be moving code into less/non-generic functions rust/llvm can re-use those new functions for every generic function.
Did a couple of more tweaks to re-use code when the output type is the same.
With the new changes (and using a different/newer compiler) we go from 47011 to 33329 (-29%)
Nice! Seems like CI is failing?
Thanks!