moveit icon indicating copy to clipboard operation
moveit copied to clipboard

Draft: Support timed locking in PlanningSceneMonitor

Open henningkayser opened this issue 3 years ago • 6 comments

This replaces the shared_mutex of the PlanningSceneMonitor with a shared_timed_mutex that allows specifying a timeout for lock attempts. The scoped PS wrappers LockedPlanningSceneRO and LockedPlanningSceneRW now support passing a timeout duration and may fail with a TryLockTimeoutException in case of a failed attempt. This is very useful for making sure the calling thread is not blocked for too long in case of concurrency issues. Additionally, the new scoped lock wrapper TimedLock allows tracking lock sources (=caller names), start time, waiting time and the duration that the lock persists. An optional DiagnosticStatus publisher can be enabled for debugging these lock events at runtime.

The scope-less locking functions (lockSceneRead(), etc...) are deprecated since they are error-prone and don't work well with the diagnostics feature.

My motivation for this was a lengthy debugging session for getting rid of spurious deadlocks and performance issues. Especially, in scenarios where the planning scene monitor is being used in a time-constrained loop (servo collision checking, online sampling for controller commands) it's critical to understand where time is being spent and what threads are being blocked. I'm not convinced that the DiagnosticStatus publisher should be merged as is (moveit_profiler would do a similar job), but the timed locking feature seems really useful and safer to me.

Left TODOs:

  • [ ] Handle locking of octomap_monitor
  • [ ] Keep track of lock owners and print names with warning/error/exception
  • [ ] Move boilerplate code into separate header

henningkayser avatar Jul 29 '21 21:07 henningkayser

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

AndyZe avatar Jul 30 '21 13:07 AndyZe

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

Good point! I'd be surprised if there was a measurable difference between shared_timed_mutex (without timeouts) and shared_mutex since they both implement the same reader/writer behavior. The wrapper class TimedLock however could actually have performance impacts, we should make sure that this is not the case.

henningkayser avatar Jul 30 '21 16:07 henningkayser

Just a high-level comment. I wonder what the overhead of a shared_timed_mutex is compared to a regular shared_mutex. Answers here suggest it can be significant. Should benchmark.

Do you have an idea how to benchmark this? If yes, we could integrate it in https://github.com/ros-planning/moveit/issues/2717

felixvd avatar Jul 31 '21 00:07 felixvd

@captain-yoshi and @felixvd here's a script that is good for benchmarking repeated, high-frequency reads of the planning scene. (https://github.com/ros-planning/moveit/pull/2698#issuecomment-853091796)

To properly test this PR, some writes to the planning scene should also be included (to mimic normal operation). For example, updating joint angles at 100-200 Hz or so.

AndyZe avatar Jul 31 '21 17:07 AndyZe

I'm also curious how std::mutex would perform, compared to shared_mutex.

AndyZe avatar Jul 31 '21 18:07 AndyZe

From your descriptions it looks like there are some things left to do to cleanup this patch.

I only skimmed through the changes for now. My main concern is backward compatibility. Is this patch supposed to be API compatible except for deprecating the non-RAII locking API (to which I definitely agree)?

v4hn avatar Aug 13 '21 11:08 v4hn