moveit2 icon indicating copy to clipboard operation
moveit2 copied to clipboard

Fuzzy-matching Trajectory Cache Injectable Traits refactor 🔥🔥

Open methylDragon opened this issue 1 year ago • 8 comments

As requested in the original PR, this PR refactors the TrajectoryCache to allow users to inject their own behaviors (which will allow them to cache on and sort by any arbitrary feature, as long as it can be represented as a series of warehouse_ros columns).

Depends on:

  • https://github.com/moveit/moveit2/pull/2944

Builds on top of:

  • https://github.com/moveit/moveit2/pull/2838

TODOs:

  • [ ] Fix integration-test
  • [x] Formatting pass (will be done after review)
  • [x] Fix tutorials
  • [x] Fix bugs

Preamble

I apologize that this PR is even larger than the one it builds on top of. Most of the added lines are docstrings or tests, and boilerplate to support the behavior injection pattern this refactor is concerned with.

On the other hand, the average file length has decreased, so the code is MUCH more modular and hopefully easy to digest.

I can't split up this PR into multiple smaller ones since technically speaking, in order to preserve cache functionality, all the feature extractors and cache insert policies introduced in this PR will need to go in together.

I would suggest looking at the tests to aid in review (they run the gamut of unit and integration tests).

You can also build and run the example:

  • https://github.com/moveit/moveit2_tutorials/pull/940

PS: If the size is still prohibitive, and we are okay with having partial implementations live in moveit2 while reviews are pending, let me know and I will split up the PR into smaller PRs (though I suspect at that point, that a logical number of splits might end up being somewhere close to 5-10 PRs.)

Navigating This PR

Since this PR is so large, here is a proposed order for comprehending the PR.

  • Fully read this PR description and the example code in the description
  • Build and run the demo to convince yourself that the cache works (instructions in that PR)
    • https://github.com/moveit/moveit2_tutorials/pull/940
  • Look at the new interfaces introduced
    • features/features_interface.hpp, cache_insert_policies/cache_insert_policy_interface.hpp
  • Then look at their implementations and tests
    • features/, cache_insert_policies/
  • Then look at the main TrajectoryCache class
    • trajectory_cache.hpp, trajectory_cache.cpp
  • Then tie it all together by looking at the example usage of the classes in this PR in the demo code in https://github.com/moveit/moveit2_tutorials/pull/940

Additionally, all docstrings are filled, including file ones, as appropriate. So hopefully any clarificatory questions would have already been pre-empted and answered.

Description

This PR builds on top of the fuzzy-matching Trajectory Cache introduced in:

  • https://github.com/moveit/moveit2/pull/2838
  • Once that is merged, the diff for this PR should reduce significantly.

The implementation in that PR coupled the cache tightly with assumptions about what features to extract and sort by (i.e., a specific set of start/goal constraints, and pruning by execution time.)

This means that users who might want to store different features or a subset of those features, or who might want to fetch and prune on different features (e.g., minimum jerk, path length, etc.) will be unable to.

This PR refactors the cache to allow users to inject their own feature extractors and pruning/insertion logic!

This is done by introducing two new abstract classes that can be injected into the cache methods, acting a lot like class "traits":

  • FeaturesInterface<>: Governs what query/metadata items to extract and append to the warehouse_ros query.
  • CacheInserterInterface<>: Governs the pruning and insertion logic.

For more details on FeaturesInterface, see the Docstrings: https://github.com/moveit/moveit2/blob/cc0feb37cf423076e133523ccdbbf3038b84a01e/moveit_ros/trajectory_cache/include/moveit/trajectory_cache/features/features_interface.hpp

Some notes:

  • I decided to not go with lambdas, because there should be tight coupling between the query and metadata insertion logic for a particular feature.
  • Similarly, cache insertion logic heavily benefits from being stateful, and coupling those chunks of logic together.

Example

In other words, before. the cache was used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(*move_group_interface, robot_name, motion_plan_req_msg,
                                            _cache_start_match_tolerance, _cache_goal_match_tolerance,
                                            /*sort_by=*/"execution_time_s", /*ascending=*/true);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  traj_cache->insertTrajectory(
      *interface, robot_name, std::move(plan_req_msg), std::move(res->result.trajectory),
      rclcpp::Duration(res->result.trajectory.joint_trajectory.points.back().time_from_start).seconds(),
      res->result.planning_time, /*delete_worse_trajectories=*/true);
}

Now the cache is used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>> features;
features.emplace_back(std::make_unique<WorkspaceFeatures>());
features.emplace_back(std::make_unique<StartStateJointStateFeatures>(start_tolerance));
// ...

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(move_group, robot_name, motion_plan_req_msg,
                                            /*features=*/features,
                                            /*sort_by=*/TrajectoryCache::getDefaultSortFeature(),
                                            /*ascending=*/true);

// Or more simply, if you want the same feature set as before the refactor, instead of painfully listing the features one by one:
// Type: std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>>
auto default_features = TrajectoryCache::getDefaultFeatures(_cache_start_match_tolerance, _cache_goal_match_tolerance);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  //
  // NOTE: Now instead of passing a trajectory, pass the plan result,
  // it'll contain the execution time and planning time we need!
  //
  // cache_inserter is a CacheInserterInterface<MotionPlanRequest, MotionPlanResponse, msg::RobotTrajectory>
  // It will tell the trajectory cache:
  //   - how to fetch "matching entries"
  //   - how to determine if they should be pruned,
  //   - how to determine when to insert the candidate cache entry
  //   - and what metadata to attach 
  //
  // additional_features allows a user to further add more metadata features for use with fetching
  // though they will not be considered by the cache_inserter
  traj_cache->insertTrajectory(
      move_group, robot_name, std::move(plan_req_msg), std::move(plan),
      /*cache_inserter=*/BestSeenExecutionTimePolicy(),
      /*prune_worse_trajectories=*/true, /*additional_features=*/{});
}

See the motion plan request features here: https://github.com/moveit/moveit2/pull/2941/commits/79b7f95660180d6554d1c6f153ada5a4ea339595

The Feature Contract

Users may use an instance of FeaturesInterface<> to fetch a cache entry only if it was supported by the CacheInserterInterface<> instance that they used (or on insert, the feature was added in additional_features).

I could not think of a way to create a coupling between uses of the cache inserters and the features. This is the cost of generality and allowing users to inject arbitrary logic into the cache.

As such, users must take care to look at the docs of the cache inserter to see what features can be used with them.

(This can be mitigated by adding helper methods to get "standard" bundles of features and a "standard" CacheInserter.)

Bonus

I added new features to the default feature extractors set and cleaned up some utilities!

There are now FeaturesInterface<> implementations that can handle path and trajectory constraints! Multiple goal constraints are also handled (at the cost of increased cardinality.)

I also added "atomic" features that wrap the basic ops you can do with warehouse_ros, to allow users to specify their own metadata to tag cache entries with.

Here: https://github.com/moveit/moveit2/pull/2941/commits/cc0feb37cf423076e133523ccdbbf3038b84a01e

Pre-Existing Support

The package now provides some starter implementations that covers most general cases of motion planning.

For more information, see the implementations of:

  • FeaturesInterface
  • CacheInsertPolicyInterface

Cache Keying Features

The following are features of the plan request and response that you can key the cache on.

These support separately configurable fuzzy lookup on start and goal conditions! Additionally, these features "canonicalize" the inputs to reduce the cardinality of the cache, increasing the chances of cache hits. (e.g., restating poses relative to the planning frame).

Supported Features:

  • "Start"
    • WorkspaceFeatures: Planning workspace
    • StartStateJointStateFeatures: Starting robot joint state
  • "Goal"
    • MaxSpeedAndAccelerationFeatures: Max velocity, acceleration, and cartesian speed limits
    • GoalConstraintsFeatures: Planning request goal_constraints
      • This includes ALL joint, position, and orientation constraints (but not constraint regions)!
    • PathConstraintsFeatures: Planning request path_constraints
    • TrajectoryConstraintsFeatures: Planning request trajectory_constraints

Additionally, support for user-specified features are provided for query-only or cache metadata tagging constant features.

Similar support exists for the cartesian variants of these.

Cache Insert and Pruning Policies

The following are cache insertion and pruning policies to govern when cache entries are inserted, and how they are (optionally) pruned.

Supported Cache Insert Policies:

  • BestSeenExecutionTimePolicy: Only insert best seen execution time, optionally prune on best execution time.
  • AlwaysInsertNeverPrunePolicy: Always insert, never prune

Caveat

The increased functionality is now no longer 100% covered. But I tried adding tests where I had time to. I am unfortunately running out of time to iterate on this, so let's be targeted with the improvements!

methylDragon avatar Jul 31 '24 08:07 methylDragon

Quick question: clang-format/tidy is erroneously editing the template parameters.

How do I get around it? Related bug: https://github.com/llvm/llvm-project/issues/46097

Is it acceptable to throw in NOLINT directives?

methylDragon avatar Aug 01 '24 07:08 methylDragon

You'd be far from the first use-case of NOLINT, but would recommend judicious use of it.

stephanie-eng avatar Aug 02 '24 03:08 stephanie-eng

I am opening this PR because it is now complete enough to warrant a review.

It is only missing a refactor of the integration test. I added unit/sanity tests (though they unfortunately don't cover every implementation detail).

For everything else, please see the description, and the companion draft examples PR:

  • https://github.com/moveit/moveit2_tutorials/pull/940

I am also running out of time to iterate on this, since I spent too much time on the refactor... It would be nice if we could get this in and RFE any minor improvements instead, since it would probably be better for MoveIt to get this feature in rather than not! ✨ 🙇

In either case, I think this should help answer most if not all queries from the parent PR!

methylDragon avatar Aug 07 '24 08:08 methylDragon

Rebased!

methylDragon avatar Sep 11 '24 06:09 methylDragon

CI is failing due to the jump_threshold arg being called on computeCartesianPlan. But the cache needs to consider it due to the field's presence in the GetCartesianPath service... How do I disable the warning?

methylDragon avatar Sep 11 '24 20:09 methylDragon

This pull request is in conflict. Could you fix it @methylDragon?

mergify[bot] avatar Sep 16 '24 14:09 mergify[bot]

Are we good to merge? (: @sjahr

Quoting from before:

CI is failing due to the jump_threshold arg being called on computeCartesianPlan. But the cache needs to consider it due to the field's presence in the GetCartesianPath service... How do I disable the warning?

methylDragon avatar Oct 17 '24 23:10 methylDragon

:see_no_evil: I've been busy but maybe I'll get to it on the weekend, sorry. The API of computeCartesianPlan got changed https://github.com/moveit/moveit2/pull/2916. Can you adapt the cache to the new API? I think you don't need to change the service for

sjahr avatar Oct 18 '24 08:10 sjahr

Small poke @sjahr (:

methylDragon avatar Nov 07 '24 23:11 methylDragon

Whoops, I did not read the log correctly. Any idea what might call this error:

$ ( source /home/runner/work/moveit2/moveit2/.work/upstream_ws/install/setup.bash && cd /home/runner/work/moveit2/moveit2/.work/target_ws && colcon test-result --verbose; )
build/moveit_ros_trajectory_cache/Testing/20241107-2346/Test.xml: 7 tests, 0 errors, 1 failure, 0 skipped
- test_utils
  <<< failure message
    -- run_test.py: invoking following command in '/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test':
     - /home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test/test_utils --gtest_output=xml:/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test_results/moveit_ros_trajectory_cache/test_utils.gtest.xml
    [==========] Running 3 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from WarehouseFixture
    [ RUN      ] WarehouseFixture.QueryAppendCenterWithToleranceWorks
Error: ROR] [1731023211.703482413] [warehouse_ros_sqlite.query]: Preparing Query failed: no such column: M_unrelated_metadata
    [       OK ] WarehouseFixture.QueryAppendCenterWithToleranceWorks (14 ms)
    [----------] 1 test from WarehouseFixture (14 ms total)
    
    [----------] 2 tests from TestUtils
    [ RUN      ] TestUtils.GetExecutionTimeWorks
    [       OK ] TestUtils.GetExecutionTimeWorks (0 ms)
    [ RUN      ] TestUtils.ConstraintSortingWorks
    [       OK ] TestUtils.ConstraintSortingWorks (0 ms)
    [----------] 2 tests from TestUtils (0 ms total)
    
    [----------] Global test environment tear-down
    [==========] 3 tests from 2 test suites ran. (14 ms total)

https://github.com/moveit/moveit2/actions/runs/11733210217/job/32686999595

This is coming from this test of the warehouse_ros utils: https://github.com/moveit/moveit2/blob/85754862c88536de7fd2a9d7b8d12d8f2691b605/moveit_ros/trajectory_cache/test/utils/test_utils.cpp#L45-L47

Which double checks that a query for a metadata key that doesn't exist returns empty, so this is working as expected. I think the emission is just a warning from warehouse_ros since we're deliberately looking up a non-existent column in the test.


If we're instead talking this segfault on cleanup:

    Stack trace (most recent call last):
    #15   Object "", at 0xffffffffffffffff, in 
    #14   Object "/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test/test_utils", at 0x55def50858a4, in _start
    #13   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f535e3f, in __libc_start_main
    #12   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f535d96, in 
    #11   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f55160f, in exit
    #10   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f551494, in 
    #9    Object "/home/runner/work/moveit2/moveit2/.work/target_ws/install/moveit_ros_warehouse/lib/libmoveit_warehouse.so.2.11.0", at 0x7fe64f1fa814, in std::unique_ptr<warehouse_ros::DatabaseLoader, std::default_delete<warehouse_ros::DatabaseLoader> >::~unique_ptr()
    #8    Object "/opt/ros/humble/lib/libwarehouse_ros.so", at 0x7fe64ead80c1, in warehouse_ros::DatabaseLoader::~DatabaseLoader()
    #7    Object "/opt/ros/humble/lib/libwarehouse_ros.so", at 0x7fe64eae1748, in 
    #6    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a181d, in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader()
    #5    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a177a, in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders()
    #4    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a15b5, in class_loader::MultiLibraryClassLoader::unloadLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    #3    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a0bd2, in class_loader::ClassLoader::unloadLibraryInternal(bool)
    #2    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a0575, in class_loader::impl::unloadLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, class_loader::ClassLoader*)
    #1    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e99e111, in class_loader::impl::findLoadedLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    #0    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f6a5a92, in 
    Segmentation fault (Address not mapped to object [0x55dba8977341])

I think it's an issue with the warehouse_ros database loader, specific to humble. So not an issue with the tests here. You can see that I'm not doing anything special with the warehouse_ros fixture the test uses

methylDragon avatar Nov 09 '24 04:11 methylDragon

@sjahr

Sorry to keep poking; is it possible to exclude the trajectory cache targets/tests from the humble CI? Or is there anything that I need to do on my end?

methylDragon avatar Nov 15 '24 05:11 methylDragon

Codecov Report

Attention: Patch coverage is 84.29434% with 286 lines in your changes missing coverage. Please review.

Project coverage is 45.61%. Comparing base (c3a2edc) to head (6b74a24). Report is 50 commits behind head on main.

Files with missing lines Patch % Lines
...veit_ros/trajectory_cache/src/trajectory_cache.cpp 0.00% 123 Missing :warning:
moveit_ros/trajectory_cache/src/utils/utils.cpp 76.48% 64 Missing :warning:
...nsert_policies/best_seen_execution_time_policy.cpp 85.58% 30 Missing :warning:
...c/features/get_cartesian_path_request_features.cpp 83.14% 28 Missing :warning:
...sert_policies/always_insert_never_prune_policy.cpp 93.01% 10 Missing :warning:
...ache/src/features/motion_plan_request_features.cpp 93.71% 8 Missing :warning:
...est_seen_execution_time_policy_with_move_group.cpp 97.33% 5 Missing :warning:
...ways_insert_never_prune_policy_with_move_group.cpp 97.61% 4 Missing :warning:
...jectory_cache/test/fixtures/move_group_fixture.cpp 85.00% 1 Missing and 2 partials :warning:
...ajectory_cache/test/fixtures/warehouse_fixture.cpp 83.34% 1 Missing and 2 partials :warning:
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2941      +/-   ##
==========================================
+ Coverage   44.17%   45.61%   +1.45%     
==========================================
  Files         698      714      +16     
  Lines       61573    62282     +709     
  Branches     7457     7527      +70     
==========================================
+ Hits        27191    28406    +1215     
+ Misses      34215    33709     -506     
  Partials      167      167              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Dec 27 '24 10:12 codecov-commenter

@sea-bass apologies for the delay. I've iterated on the PR and responded to/implemented all review comments!

methylDragon avatar Dec 27 '24 13:12 methylDragon

Humble CI is failing tests (likely due to missing backports...)

EDIT: Adding a mitigation...

methylDragon avatar Dec 27 '24 13:12 methylDragon

@methylDragon seems there is a compilation error now.

/home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_ros/trajectory_cache/test/cache_insert_policies/test_best_seen_execution_time_policy_with_move_group.cpp:432:19: error: ‘struct moveit_msgs::srv::GetCartesianPath_Response_<std::allocator<void> >’ has no member named ‘trajectory’
    432 |       longer_plan.trajectory.joint_trajectory.points.back().time_from_start.sec = 100;
        |                   ^~~~~~~~~~

I think it just requires changing .trajectory to .solution for the tests that deal with Cartesian paths.

sea-bass avatar Dec 29 '24 04:12 sea-bass

There is also a clang-tidy failure with the tf arg passed into restateInNewFrame() -- even though it's a shared pointer, it's asking to use const ref which is not necessary.

Anyways, feel free to resolve this however you see fit and otherwise LGTM.

Edit: Do you think this will help? https://github.com/moveit/moveit2/pull/3181

sea-bass avatar Dec 29 '24 04:12 sea-bass

There is also a clang-tidy failure with the tf arg passed into restateInNewFrame() -- even though it's a shared pointer, it's asking to use const ref which is not necessary.

Anyways, feel free to resolve this however you see fit and otherwise LGTM.

Edit: Do you think this will help? https://github.com/moveit/moveit2/pull/3181

Thanks for that PR! I think I'd go with constref of the shared pointer for now.

I was doing the pass by value so it would do a copy to extend the lifespan, but I think currently all callsites guarantee that the originating pointer will outlive the call anyway, so moving to constref would let us save on the reference count increment while also allowing us to copy internally (to extend the lifespan) if in the future we deem it necessary.

Notably I see the same sort of thing (taking constref of shared_ptrs) across the repo anyway.

methylDragon avatar Dec 29 '24 05:12 methylDragon

Sounds good. Also seeing test_best_seen_execution_time_policy_with_move_group.cpp failing in some jobs?

sea-bass avatar Dec 29 '24 05:12 sea-bass

Sounds good. Also seeing test_best_seen_execution_time_policy_with_move_group.cpp failing in some jobs?

Let me try with the latest commit. Let's see how CI responds.

methylDragon avatar Dec 29 '24 06:12 methylDragon

Latest commit seems to look good -- I am rerunning CI a few times just to confirm. Thanks for working on this, @methylDragon!

sea-bass avatar Dec 29 '24 13:12 sea-bass

Ah, the 2nd run of CI failed in the rolling/clang-tidy job, actually.

I'd be a bit wary of merging with these flaky tests. Any ideas on how to improve this behavior? And what do you think was merged upstream to be causing this?

Maybe https://github.com/moveit/moveit2/pull/3143?

sea-bass avatar Dec 29 '24 13:12 sea-bass

Ah, the 2nd run of CI failed in the rolling/clang-tidy job, actually.

I'd be a bit wary of merging with these flaky tests. Any ideas on how to improve this behavior? And what do you think was merged upstream to be causing this?

Maybe #3143?

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/

The test this PR introduces basically checks if the nodes involved with the launchfile all terminated nicely, and also had the appropriate number of PASS printouts. (No way around it otherwise since move_group requires the robot node and ros2_control to work)

Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

methylDragon avatar Dec 29 '24 16:12 methylDragon

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/

Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

Those ros2_control segfaults in integration tests are unfortunately false alarms in all these tests -- there is an actual assertion failing here: https://github.com/moveit/moveit2/actions/runs/12533008564/job/34958234846?pr=2941#step:11:10051

    [test_best_seen_execution_time_policy_with_move_group-8] /home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_ros/trajectory_cache/test/cache_insert_policies/test_best_seen_execution_time_policy_with_move_group.cpp:170: Failure
    [test_best_seen_execution_time_policy_with_move_group-8] Value of: ret
    [test_best_seen_execution_time_policy_with_move_group-8]   Actual: false
    [test_best_seen_execution_time_policy_with_move_group-8] Expected: true
    [test_best_seen_execution_time_policy_with_move_group-8] BestSeenExecutionTimePolicy: Empty joint trajectory points.
    [test_best_seen_execution_time_policy_with_move_group-8] 
    [test_best_seen_execution_time_policy_with_move_group-8] [  FAILED  ] MoveGroupFixture.BestSeenExecutionTimePolicyWorks (5209 ms)

sea-bass avatar Dec 29 '24 19:12 sea-bass

It's the ros2_control_node itself that's getting a segfault though. I doubt this is related to moveit :/ Test is failing because ros2_control's node segfaults and returns a bad exit code. This is a different failure mode from the one my commit fixed.

Those ros2_control segfaults in integration tests are unfortunately false alarms in all these tests -- there is an actual assertion failing here: https://github.com/moveit/moveit2/actions/runs/12533008564/job/34958234846?pr=2941#step:11:10051

    [test_best_seen_execution_time_policy_with_move_group-8] /home/runner/work/moveit2/moveit2/.work/target_ws/src/moveit2/moveit_ros/trajectory_cache/test/cache_insert_policies/test_best_seen_execution_time_policy_with_move_group.cpp:170: Failure
    [test_best_seen_execution_time_policy_with_move_group-8] Value of: ret
    [test_best_seen_execution_time_policy_with_move_group-8]   Actual: false
    [test_best_seen_execution_time_policy_with_move_group-8] Expected: true
    [test_best_seen_execution_time_policy_with_move_group-8] BestSeenExecutionTimePolicy: Empty joint trajectory points.
    [test_best_seen_execution_time_policy_with_move_group-8] 
    [test_best_seen_execution_time_policy_with_move_group-8] [  FAILED  ] MoveGroupFixture.BestSeenExecutionTimePolicyWorks (5209 ms)

Ah! I see, I think this is an error on my part while writing the tests. Let's try with 6b19476dc1f4faec53eb0e28bff4d4b4f0485a7e

Almost to the finish line! Thanks for the prompt reviews :bow:

methylDragon avatar Dec 30 '24 04:12 methylDragon

Seems the lingering flaky failures are now based on exit codes. You could consider trying something like this:

https://github.com/moveit/moveit2/pull/3124/commits/a401e13f902e8a5b17ed40d7efdfcb535f7f1779

sea-bass avatar Dec 30 '24 06:12 sea-bass

Seems the lingering flaky failures are now based on exit codes. You could consider trying something like this:

a401e13

Hmm, okay. Less ideal since it feels like papering over the issue, but okay since technically the segfaults are unrelated. 03e0dc494e5bfab38e2a164043a87d94b9167169

methylDragon avatar Dec 30 '24 12:12 methylDragon

Aw man, latest run has a real failure in the test again...

sea-bass avatar Dec 30 '24 15:12 sea-bass

Oh man, let me continue investigating. Sorry for the repeated back and forths..

methylDragon avatar Dec 30 '24 21:12 methylDragon

d78fd2267b5d61ec828d119a22a0e881e3b228b1 is the only thing I can think of, otherwise I will need to const define any trajectories in the test (which will be quite wordy)

EDIT: Looks like the CI failure is a different flaky test now (pilz). Unfortunately my local setup makes it difficult to replicate CI, but I did run it a couple times and didn't see the same failure we saw before.

methylDragon avatar Dec 30 '24 22:12 methylDragon

Seems fine now -- in 5 tries, only got another exit code related failure on a Pilz test, which is for sure not linked to this PR.

Thanks for all the hard work!

Thanks for the in-depth reviews! They helped test stability and fixing some minor bugs in the code.

After this is merged the only thing left for this entire feature is the tutorial:

  • https://github.com/moveit/moveit2_tutorials/pull/940

Pinging @sjahr for re-approval (from stale review) and merge!

methylDragon avatar Dec 31 '24 00:12 methylDragon