ros2_controllers icon indicating copy to clipboard operation
ros2_controllers copied to clipboard

Scaled jtc

Open fmauch opened this issue 1 year ago • 22 comments

This is my take2 on #301.

I plan to finish this until Friday, @firesurfer feel free to poke me :-)

Currently missing

  • [ ] tests for scaled execution
  • [x] There are some ToDos in the documentation:
    • [x] Algorithmics
    • [x] Handling tolerances

Things that will get changed, as discussed in the latest working group meeting

  • [x] Allowing scaling factors above 1 ~~might get postponed to a later point.~~
  • [x] Report the scaling factor in the controller state
  • [x] Use a custom datatype for the scaling_factor subscriber (std_msgs are deprecated)
  • [x] Change subscription to transient_local. This will require documentation, as a standard ros2 topic pub will not work with this.

fmauch avatar Jul 03 '24 18:07 fmauch

Thoughts on tests I should implement:

  • Execution with scaling factor of 0.5 takes twice as long (and does not fail, when no tolerances are set)
  • Execution with scaling < 1 and goal time tolerance leads to an early halt and failed execution
  • Execution with scaling < 1 and path tolerances set should not fail (if goal_time_tolerance is large enough)
  • Execution with scaling > 1 should not fail independent of the tolerances set <-- is that true?

fmauch avatar Jul 03 '24 19:07 fmauch

hi @fmauch Great to see this being worked on again. I added some comments to the code.

Regarding the tests:

Execution with scaling factor of 0.5 takes twice as long (and does not fail, when no tolerances are set)

Makes sense

Execution with scaling < 1 and goal time tolerance leads to an early halt and failed execution

Isn't this what we patched in: https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/pull/882. As far as I can tell you also backported this patch. (And I still think the goal time should be also scaled ;) )

Execution with scaling > 1 should not fail independent of the tolerances set <-- is that true?

Same as above. From heavily using this controller in our system I have the opinion that in most use cases it makes sense (and is what a user would expect) that the goal time should also be scaled. But perhaps I understood your suggestion wrong.

firesurfer avatar Jul 04 '24 05:07 firesurfer

From heavily using this controller in our system I have the opinion that in most use cases it makes sense (and is what a user would expect) that the goal time should also be scaled. But perhaps I understood your suggestion wrong.

Thank you for your input. That's why I put my brain dump on that here, so we can discuss this :-) And that's also why I didn't write the documentation on that, yet.

I'll wrap my head around this a little more and try to get a better feeling for the current behavior, then come back to this.

fmauch avatar Jul 04 '24 06:07 fmauch

I've took some time to think about the controller's behavior and currently, it should be like this, right @firesurfer?

  • Execution with scaling factor of 0.5 takes twice as long.
  • Execution with scaling factor of 2 should take 50% of the time.
  • Execution with scaling factor of 1 should take as long as expected
  • Independent of goal and path tolerances the action should not fail -- the goal time is scaled, as well.

The latter is the main point we discussed in https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/pull/882. I am still fine with that approach, although I am not sure whether this covers every use case.

But with this I think I can finish the documentation and write some tests tomorrow.


Edit: Changed the third bullet point scaling 0 to 1. That was a mistake when writing this.

fmauch avatar Jul 04 '24 20:07 fmauch

@fmauch

Execution with scaling factor of 0.5 takes twice as long.

Yes. we sample the trajectory half as fast

Execution with scaling factor of 2 should take 50% of the time.

Yes we sample the trajectory twice as fast

Execution with scaling factor of 0 should take as long as expected

I would call this paused. I have the feeling this is one of the major usecases of this controller

Independent of goal and path tolerances the action should not fail -- the goal time is scaled, as well.

In case the trajectory is within the (scaled) goal time, it should not fail -> yes. Otherwise the combination of a scaled trajectory + a set goal time tolerance would always fail. Is there a specific use case you have in mind that would not be covered by that ?

firesurfer avatar Jul 05 '24 06:07 firesurfer

My concern is mainly this:

If you have an application and you execute a trajectory on a robot that does speed scaling. The trajectory gets slowed down due to safety reasons, which means it finishes later than originally requested. This will slow down the whole application and no longer behave as programmed. This might be considered a fault.

One could argue that

  1. Depending follow-up actions should be triggered once their post-conditions are met (the motion action finished) and nit by time.
  2. If execution time is critical, that can be monitored as a wrapper. MoveIt! for example does that by default.

I just quickly discussed that with a colleague and he also agreed that the behavior that we currently have is probably the expected one. He was also emphasizing the second point.

So, I guess, I'll just leave that result standing. As I said, I can definitively live with it. And it's probably even better / more appropriate than "I can live with that".

fmauch avatar Jul 05 '24 07:07 fmauch

Update on the tests:

  • testing is currently difficult in this repository, see https://github.com/ros-controls/ros2_controllers/issues/1101
  • There is also a crash in one of the tests, I will open a separate issue with that
  • Running the current tests produces an error in almost all tests:
    unknown file: Failure
    C++ exception with description "can't subtract times with different time sources [3 != 1]" thrown in the test body.
    
  • There is currently an issue with the goal time tolerance. The action goal's tolerance always overwrites the one from he parameter. I'll add a PR for that.

fmauch avatar Jul 05 '24 12:07 fmauch

* testing is currently difficult in this repository, see [Test failures: subscription already associated with a wait set #1101](https://github.com/ros-controls/ros2_controllers/issues/1101)

In the last weeks, I compiled ros2_control+controllers on jammy+humble to run the tests.

christophfroehlich avatar Jul 05 '24 12:07 christophfroehlich

Update on the tests:

* testing is currently difficult in this repository, see [Test failures: subscription already associated with a wait set #1101](https://github.com/ros-controls/ros2_controllers/issues/1101)

* There is also a crash in one of the tests, I will open a separate issue with that

* Running the current tests produces an error in almost all tests:
  ```
  unknown file: Failure
  C++ exception with description "can't subtract times with different time sources [3 != 1]" thrown in the test body.
  ```

So what about the remaining issues especially regarding the tests. From what I can tell so far:

  1. Does not build on rolling because there the updated control_msgs are not used. Probably it would make sense to merge the control_msgs change first. Given that the scaled JTC should/will definitely go upstream there should be no reason not to merge as soon as possible.
  2. Then in case of Jazzy where the build apparently succeeds there are multiple unrelated tests failing

Regarding the tests: @christophfroehlich I am not sure if your suggestion works. I had to explicitly make 1-2 small fixes to some ros2control packages in order to compile ros2control on iron. I would guess that's the same for humble. Is there currently any work in progress in order to get the test working on rolling again?

@fmauch I guess given that you can't run any of the tests at the moment you didn't start to implement any of the new tests we discussed before?

* There is currently an issue with the goal time tolerance. The action goal's tolerance always overwrites the one from he parameter. I'll add a PR for that.

Is there already an issue for that ? Do we need to discuss it in an extra issue?

firesurfer avatar Jul 10 '24 09:07 firesurfer

@firesurfer we have a CI job for that, it currently builds but some tests fail/are flaky https://github.com/ros-controls/ros2_control_ci/actions/runs/9866959705

christophfroehlich avatar Jul 10 '24 16:07 christophfroehlich

@fmauch I guess given that you can't run any of the tests at the moment you didn't start to implement any of the new tests we discussed before?

Precisely. #1206 could be the PR removing this barrier.

Is there already an issue for that ? Do we need to discuss it in an extra issue?

Sorry for not answering earlier: See #1192

So, I'll probably start implementing tests tomorrow, either locally rebased ontop of #1206 or on a merged master :-)

fmauch avatar Jul 16 '24 07:07 fmauch

I was sick yesterday, but I'll try my best to squeeze in the tests until the end of the week.

fmauch avatar Jul 18 '24 06:07 fmauch

This pull request is in conflict. Could you fix it @fmauch?

mergify[bot] avatar Jul 21 '24 21:07 mergify[bot]

@fmauch what's the current state ?

firesurfer avatar Aug 06 '24 11:08 firesurfer

@fmauch what's the current state ?

I've spent some time last week (or the week before?) on trying to get the tests running correctly.

With that I realized that I basically ran into #697. The JTC currently samples the trajectory at the time given to the update loop. In the scaled version, however we basically sample at $t+\Delta t$. That is a conceptual difference. In the latest working group meeting we decided that I will change the JTC accordingly, as this is actually the "proper" approach.

Another issue related to that is that we move forward in the trajectory only based on the controller periods combined with the speed scaling (which again is different to the current method which is why I will have to rewrite the tests). We can't really use the time directly, since the time we spent in the trajectory has been scaled, so we have to sum up the periods.

This will also get interesting when it comes to trajectory replacement, I think, but I haven't looked into that, yet.

So, in summary: Looks like the devil is in the details and this requires still some implementation work to do. While looking at that I did some refactoring work which I haven't pushed yet, as I want to get this together with working basic tests to make sure that the refactoring is actually behaving as expected. As I will be on vacation the next three weeks I don't know how much time I will find for that, though. But I will try.

fmauch avatar Aug 08 '24 08:08 fmauch

I'm planning to split things up a bit into separate PRs to make things with a smaller scope, especially when it comes to adapting the tests.

  • Use $t+\Delta t$ in the JTC
  • Calculate the sample time based upon passed periods instead of the absolute time. This is a requirement for the scaled JTC, since we have to sum up the scaled periods to move forward in the trajectory.
  • Add Speed scaling. This will basically only add the interfaces for speed scaling and change that the period the trajectory is moved forward isn't necessarily the complete last update cycle.

I'll leave out trajectory replacement for now, since this shall be implemented at a later point, anyway.

It might still make sense to merge them all in one PR, but I'll separate at least development like that.

fmauch avatar Aug 12 '24 18:08 fmauch

It might still make sense to merge them all in one PR, but I'll separate at least development like that.

Ping me if you want me to create a feature/scaled_jtc branch in this repo for your individual PRs. However, the first two points seem to be independent somehow.

christophfroehlich avatar Aug 12 '24 18:08 christophfroehlich

This pull request is in conflict. Could you fix it @fmauch?

mergify[bot] avatar Aug 26 '24 12:08 mergify[bot]

@fmauch

feel free to poke me :-)

Any updates ? :D

firesurfer avatar Sep 09 '24 06:09 firesurfer

Any updates ? :D

I almost finished the t+dT change on https://github.com/fmauch/ros2_controllers/tree/jtc_tdt. This change still makes tests fail which I have to investigate: https://github.com/fmauch/ros2_controllers/commit/f4530490e310cd5ee1aafe4cd8962015df34ffb5#diff-8dd2135dc1f4abe6382cc7671ef9ce939196b2d8adf18ac3a77b5c0fddf1a25fR287

fmauch avatar Sep 14 '24 06:09 fmauch

Update: #1306 implements sampling at $t + \Delta T$. Furthermore, https://github.com/ros-controls/ros2_control/pull/1774 should fix incorrect controller periods reported to the controller.

With this the next step would be to propagate through the trajectory based on the accumulated periods instead of the absolute time. I expect that this will be quite a lot of work to make all tests comply with that.

I will update this PR once #1306 is merged.

fmauch avatar Oct 07 '24 13:10 fmauch

Another update:

  • [x] Use t + Δ t in the JTC (implemented in #1306)
  • [x] Calculate the sample time based upon passed periods instead of the absolute time. This is a requirement for the scaled JTC, since we have to sum up the scaled periods to move forward in the trajectory. Implemented in #1395
  • [x] Add Speed scaling. (All merged together currently lives on https://github.com/fmauch/ros2_controllers/tree/scaled_jtc_merged)

Edit:

Note: You will need https://github.com/ros-controls/ros2_control/pull/1774 in order to get valid period times.

fmauch avatar Oct 14 '24 04:10 fmauch

This should be a state that could be started to get reviewed. There are a couple of things left that I'm not 100% happy, but they don't necessarily have to be covered in this PR in my opinion:

  • I started using the feedback's actual.time_from_start to show the controller time passed since the trajectory started and desired.time_from_start for the scaled trajectory time since the trajectory started. This was mainly done since I needed a way to ovserve scaled time for testing purposes.
  • The scaling method assumes (as mentioned in the docs) that the robot is scaling down the trajectory, so we have to scale time when looking at the beginning of the control period (How much of one period have we actually progressed in the last period?) and then we sample one full period ahead. If the robot doesn't scale down but we still want to use scaling, this will leave us with a tracking error. In that case we should scale the period for sampling the control target. My idea would be to make those two scaling methods available with a scaling_type parameteter with the values HARDWARE and CONTROLLER. I would also be happy to move that to a follow-up PR in order to get this one finally wrapped up. @RobertWilbrandt does this summarize the discussion we had a couple of days ago?
  • There are a couple of builds failing that I don't quite understand. gpio controller is complaining about missing messages. Is this a cache thing? Because it is building on the master branch and also locally things are fine, so at least the semi-binary builds should be happy.

fmauch avatar Dec 03 '24 06:12 fmauch

* There are a couple of builds failing that I don't quite understand. gpio controller is complaining about missing messages. Is this a cache thing? Because it is building on the master branch and also locally things are fine, so at least the semi-binary builds should be happy.

~~I had a look now, I don't know tbh. The debian build fails with the same error, but this one doesn't use caches.~~ Ah, I found the issue ;)

christophfroehlich avatar Dec 03 '24 11:12 christophfroehlich

Codecov Report

Attention: Patch coverage is 98.26840% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.47%. Comparing base (fc41a09) to head (c839c94). Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ory_controller/src/joint_trajectory_controller.cpp 93.75% 1 Missing and 2 partials :warning:
...ectory_controller/test/test_trajectory_actions.cpp 98.70% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1191      +/-   ##
==========================================
+ Coverage   86.23%   86.47%   +0.24%     
==========================================
  Files         123      123              
  Lines       11903    12133     +230     
  Branches      994     1008      +14     
==========================================
+ Hits        10264    10492     +228     
+ Misses       1337     1335       -2     
- Partials      302      306       +4     
Flag Coverage Δ
unittests 86.47% <98.26%> (+0.24%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...jectory_controller/joint_trajectory_controller.hpp 100.00% <ø> (ø)
...ory_controller/test/test_trajectory_controller.cpp 99.80% <100.00%> (+0.02%) :arrow_up:
...ntroller/test/test_trajectory_controller_utils.hpp 85.02% <100.00%> (+0.36%) :arrow_up:
...ectory_controller/test/test_trajectory_actions.cpp 97.32% <98.70%> (+0.21%) :arrow_up:
...ory_controller/src/joint_trajectory_controller.cpp 86.82% <93.75%> (+0.71%) :arrow_up:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 03 '24 11:12 codecov[bot]

~I had a look now, I don't know tbh. The debian build fails with the same error, but this one doesn't use caches.~ Ah, I found the issue ;)

facepalm I thought I changed that back some time ago already.

fmauch avatar Dec 03 '24 15:12 fmauch

@fmauch whats the current state of the scaled jtc efforts?

firesurfer avatar Mar 27 '25 10:03 firesurfer

@RobertWilbrandt would you mind taking a review on this, as well, since you use it yourself AFAIK.

fmauch avatar Apr 30 '25 11:04 fmauch

Sorry, for opening this again, but going through the unresolved comments, I refined the docstring requested in https://github.com/ros-controls/ros2_controllers/pull/1191#discussion_r2162069371 and extended the userdocs regarding https://github.com/ros-controls/ros2_controllers/pull/1191#discussion_r1870812258.

fmauch avatar Jun 24 '25 19:06 fmauch