Compilation performance is rather slow
Bug report
I find the compile times for simple ROS logic to be quite high. We have a system with around 100 topics set up with a CI toolchain etc. As build times increased I did some digging around and one of the issue is the compilation performance of ros function calls (eg create_subscription)
Required Info:
- Operating System:
- Ubuntu 20.04
- Installation type:
- apt install ros-galactic-dekstop
- Version or commit hash:
- ros-galactic-desktop/focal,now 0.9.3-2focal.20220430.204723 amd64
- DDS implementation:
- default
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
Minimal example that creates some subscriptions:
#include <geometry_msgs/msg/pose_array.hpp>
#include <geometry_msgs/msg/quaternion_stamped.hpp>
#include <geometry_msgs/msg/twist.hpp>
#include <rclcpp/node.hpp>
void test() {
rclcpp::Node node("test");
node.create_subscription<geometry_msgs::msg::Twist>("test1", rclcpp::QoS(1), [](geometry_msgs::msg::Twist::ConstSharedPtr) {});
node.create_subscription<geometry_msgs::msg::PoseArray>("test2", rclcpp::QoS(1), [](geometry_msgs::msg::PoseArray::ConstSharedPtr) {});
node.create_subscription<geometry_msgs::msg::QuaternionStamped>("test3", rclcpp::QoS(1), [](geometry_msgs::msg::QuaternionStamped::ConstSharedPtr) {});
}
compile:
time /usr/bin/c++ -isystem /opt/ros/galactic/include -Wall -Wextra -std=gnu++17 -c test.cc
Expected behavior
I don't now? Up to 2s max?
Actual behavior
On my system (intel [email protected] x8, 16G, SSD, gcc 9.4.0):
- gcc takes about 8 seconds.
- (clang about 4.5 seconds, significantly faster but I can not make the switch at this moment...)
The above example results in an object file that's 11M and contains about 17.000 symbols, so I think all the in-header and template definitions are exploding a bit.
Is there a work around that someone knows of, or is it possible to make a leaner header file?
Just FYI, with 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
# time /usr/bin/c++ -isystem /opt/ros/galactic/include -Wall -Wextra -std=gnu++17 -c test.cc
real 0m5.230s
user 0m4.966s
sys 0m0.243s
Is there a work around that someone knows of, or is it possible to make a leaner header file?
how about using rclc instead?
One workaround that can help is to not include rclcpp/rclcpp.hpp, which brings in everything, and instead just bring in certain header files from rclcpp that you need. For instance, if you just need rclcpp::Node, you can #include <rclcpp/node.hpp>.
That said, the most commonly used header files (like rclcpp/node.hpp) also tend to be the most expensive. It would be nice to reduce the overhead of some of them, but it is not totally clear how to do that while maintaining functionality.
Still an issue on humble. With the following command and the test.cc provided by OP :
#time /usr/bin/g++ -I/opt/ros/humble/include/{rosidl_runtime_cpp,rosidl_runtime_c,geometry_msgs,std_msgs,builtin_interfaces,rclcpp,rcutils,rcl,rmw,rcl_yaml_param_parser,rcl_interfaces,rosidl_typesupport_interface,rcpputils,tracetools,libstatistics_collector,statistics_msgs} -Wall -Wextra -std=gnu++17 -c test.cc
real 0m7.358s
user 0m6.188s
sys 0m1.119s
Without the subscribers (but with the includes) :
real 0m0.901s
user 0m0.702s
sys 0m0.200s
If you want the compile-time flexibility that templates give you, there's probably not much that can be done about this. However, if you structure your code right, it's not much of an issue in practice anyways: Have one file that contains the node and uses the callbacks to call into your application code over well-defined interfaces (functions, classes) that don't change often. Then your application code can be compiled independently from the node and the object files only need to be linked together.
A majority of the time is actually spent instantiating templates. Here's a flame graph from a run with Clang (-ftime-trace):

You can clearly see the three calls to create_subscription().
I guess one could implement some support for explicit instantiations here: Have a macro or something that creates the correct forward declaration with the template parameters used, so that the template wouldn't have to be instantiated from the node implementation. Then do explicit instantiations in some other .cpp and have the linker put them together. The expectation then would be that the .cpp with the explicit instantiations changes almost never, while the node implementation changes often, but since we have moved the template instantiations out of the node implementation, its compilation would be faster. However, with the approach I outlined above where you put your application logic outside of the node implementation, you basically achieve the same thing. Also, doing the explicit instantiation on the level of create_subscription() wouldn't be straightforward, since the callback is also a template parameter.
Reducing the amount of instantiated code also helps. With this I was able to get the time to < 1 s on my machine from 3.5 s before with clang++-14:
#include <geometry_msgs/msg/pose_array.hpp>
#include <geometry_msgs/msg/quaternion_stamped.hpp>
#include <geometry_msgs/msg/twist.hpp>
#define RCLCPP__NODE_IMPL_HPP_
#include <rclcpp/node.hpp>
#include <rclcpp/create_subscription.hpp>
void test()
{
rclcpp::Node node("test");
node.create_subscription<geometry_msgs::msg::Twist>("test1", rclcpp::QoS(1), std::function([](geometry_msgs::msg::Twist::ConstSharedPtr) {}));
node.create_subscription<geometry_msgs::msg::PoseArray>("test2", rclcpp::QoS(1), std::function([](geometry_msgs::msg::PoseArray::ConstSharedPtr) {}));
node.create_subscription<geometry_msgs::msg::QuaternionStamped>("test3", rclcpp::QoS(1), std::function([](geometry_msgs::msg::QuaternionStamped::ConstSharedPtr) {}));
}
This is something you could already do today. It'd break of course if you need something that is only in node_impl.hpp and cannot simply pulled in via a separate header (such as create_subscription.hpp).
We have noticed build time bloats with humble compared to galactic. Take minimal_subscriber/lambda.cpp from ros2/examples for example, it compiles 67% slower in humble (5.2s v.s. 3.1s).
Clang Build Analyzer shows that rclcpp::Node::create_subscription<$> takes 80% more time to instantiate (671ms v.s. 1206ms). rclcpp.hpp also takes twice as much time to include (975ms v.s. 2034ms).
For this particular example, one possible source of increasing build time might be the new Type Adaptation feature in humble. However, we have also seen considerably slower build time across the board in our project.
Complete reports: galactic-build-analysis-subscriber_lambda.txt humble-build-analysis-subscriber_lambda.txt
This issue has been mentioned on ROS Discourse. There might be relevant details there:
https://discourse.ros.org/t/compilation-bloat-issue/30345/1
This issue has been mentioned on ROS Discourse. There might be relevant details there:
https://discourse.ros.org/t/compilation-bloat-issue/30345/5
I managed to greatly reduce create_subscription's build time by using rclcpp::GenericSubscription instead then rclcpp::Serialization<MessageT>::deserialize_message in a callback. This is convenient if you simply want to prototype rapidly while reaching ROS 1 compilation speed without breaking your node to ROS 2/non-ROS 2 parts as was suggested by @ahans and you don't have dozens of subscriptions.
I assume this can be templatized for normal subscribers too by implementing an option for rclcpp::SubscriptionOptions() (something like .use_generic_for_faster_compilation = true; ) that switches to generic subscription behind the scene but I haven't tried it myself.