ros2_controllers
ros2_controllers copied to clipboard
Fix segfault in JTC and simplify checks
This PR fixes a segfault in JTC, we were getting a segfault every time we deactivate and then re-activate the controllers.
Attaching a debugger to the node shows read_state_from_hardware(state_desired_) causing the segfault, for some reason the variable in empty, digging more I don't see this variable being modified much, but I suspect it's related to the call to sample which set it to the default value output_state = trajectory_msgs::msg::JointTrajectoryPoint()
Are you changing interfaces between reactivation cycles?
No, we don't, it's the same controller
I am not convinced that this actually solves the cause of the issue. I would say it's more about hiding the real issue. I would wait to reproduce this in the tests and then know what is causing this...
This pull request is in conflict. Could you fix it @JafarAbdi?
Codecov Report
Merging #423 (154bbd1) into master (e7f9962) will decrease coverage by
3.31%
. The diff coverage is26.60%
.
@@ Coverage Diff @@
## master #423 +/- ##
==========================================
- Coverage 35.78% 32.48% -3.31%
==========================================
Files 189 7 -182
Lines 17570 665 -16905
Branches 11592 357 -11235
==========================================
- Hits 6287 216 -6071
+ Misses 994 157 -837
+ Partials 10289 292 -9997
Flag | Coverage Δ | |
---|---|---|
unittests | 32.48% <26.60%> (-3.31%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...ontroller/test/test_load_diff_drive_controller.cpp | 12.50% <0.00%> (ø) |
|
diff_drive_controller/src/odometry.cpp | 42.16% <11.11%> (ø) |
|
diff_drive_controller/src/speed_limiter.cpp | 46.55% <11.11%> (ø) |
|
...ive_controller/test/test_diff_drive_controller.cpp | 17.62% <12.08%> (ø) |
|
...troller/include/diff_drive_controller/odometry.hpp | 20.00% <20.00%> (ø) |
|
...iff_drive_controller/src/diff_drive_controller.cpp | 39.22% <35.50%> (ø) |
|
...de/diff_drive_controller/diff_drive_controller.hpp | 100.00% <100.00%> (ø) |
|
...ontrollers/src/joint_group_velocity_controller.cpp | ||
...test/test_load_joint_group_velocity_controller.cpp | ||
...include/joint_trajectory_controller/tolerances.hpp | ||
... and 191 more |
I'm sorry, I forgot to mention that the test I added has an issue. I had a call with Denis, and he asked me to push my WIP test, so he could take a look and see if he can help finish it!
I tried to merge this with the old reactivation test, where actual trajectory execution is done. The complexity there is that we change the controller to use input time and period in the update method, so their parameters should be adjusted.
First the test need to replicate the described issue and after that show that fix is working. The position
, velocity
and acceleration
filed don't need to have the same size, i.e., they may be uninitialized if not used. But they should definitely be allocated in the configure
method. The question here is when after reactivation we loose the initialization in the memory of those filed. That's confusing for me – this is the reason I think we have much significant issue then this fix proposes.
@swiz23 noticed a segfault in the JTC specifically when aborting a motion, followed by deactivating, followed by activating where the segfault would occur. After looking at this PR, I think it very much has to do with the line output_state = trajectory_msgs::msg::JointTrajectoryPoint(). I think the flow of events occur as follows:
- When aborting, set_hold_position() is called. That essentially adds and writes an empty trajectory message in order to disregard the current one.
- When sample() is called in the update() loop,
state_desired_
gets overwritten with that empty trajectory point meaning it has an empty position, velocity, and acceleration array. - The controllers get deactivated
- The controllers get reactivated. When this happens,
read_state_frome_hardware
is called onstate_desired_
. Within that function, we try to index the position array withinstate_deisred_
as if it has the same size asdof_
when it in fact no longer does. This causes the segault.
The solution is to simply move the line defining the output_state within the sample() function in trajectory.cpp to after the last possible place that the function can return false -> here. I tested this out at least for the case that @swiz23 had issues with and verified that there was no longer a segfault.
Thanks @mechwiz for the fix. I'm closing this since someone else who currently can reproduce the issue have a fix