iceoryx icon indicating copy to clipboard operation
iceoryx copied to clipboard

Deprecate posix timer

Open marthtz opened this issue 3 years ago • 54 comments

Brief feature description

Deprecate posix timer from iceoryx code base.

Detailed information

Posix timer has several issues (concurrency, truncation) and is pretty heavy weight and a bit slow. As iceorxy needs to be very responsive the posix timer should be removed and replaced with a lightweight stopwatch.

This will also close #190 as it'll be obsolete.

Tasks

  • [x] create DeadlineTimer or similar and replace this part of the PosixTimer where applicable
  • [x] create an object that creates a thread and peridically executes a task
  • [ ] create a small wrapper around a posix::semaphore which can be used for concurrent::PeriodicTask
  • [x] replace the keep alive message from a std::chrono timestamp with a units::Duration from a timespec initialized with clock_gettime with CLOCK_REALTIME -> maybe better to use a different issue for this

marthtz avatar Nov 05 '20 14:11 marthtz

@marthtz I think a stopwatch is not exactly what we need. We have to rewrite the timer so that the timer does not work with callbacks anymore but instead with signals. If you look at the manpage https://man7.org/linux/man-pages/man2/timer_create.2.html we have to possible approaches:

  • send a signal to the whole process
  • send a signal to a specific thread specified when creating the timer

I think we should explore the second option. Maybe it has similar drawbacks than the callback approach but if not it should be our way to go.

@elBoberido what do you think?

elfenpiff avatar Nov 06 '20 08:11 elfenpiff

@elfenpiff @marthtz please don't call it stopwatch. A stopwatch is measuring time, this functionality is running towards a deadline or countdown. See also the equivalent int boost and Qt.

To the other points. The posix timer was started as a deadline timer and then enhanced with the posix calls. Currently it has both functionalities.

  1. cut out the deadline functionality and use that class where appropriate
  2. deprecate the current posix timer and replace the remaining occurrences with std::thread and sleep_for
  3. decide if we really need something like the posix timer. If we do, implement the second option from @elfenpiff which is also described in more detail in #170. But do this in an own class. Don't repeat the mistake and stuff it into the deadline functionality.

elBoberido avatar Nov 06 '20 13:11 elBoberido

@elfenpiff I am currently working on this story. Based on the comments from @dkroenke I am working on to replace the timer.start() function with a simple thread sleep. Currently the posixtimer.start() is being called within the posh_runtime, service_discovery_notifier.

shankar-in avatar Nov 06 '20 13:11 shankar-in

@shankar-bosch a simple sleep_for is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use the sleep_for.

elBoberido avatar Nov 06 '20 13:11 elBoberido

@elBoberido okay I will do so

shankar-in avatar Nov 06 '20 13:11 shankar-in

@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.

elfenpiff avatar Nov 06 '20 16:11 elfenpiff

@elfenpiff sure. I am not in the list of contributors. Could you pls add me? I can assign this issue in my name and bring this to the board.

shankar-in avatar Nov 07 '20 15:11 shankar-in

@elBoberido I have clarifications. As you know the timers can be one shot or periodic based on the it_interval . If it is a one shot timer there can be a thread::sleepfor(it_value). if the timer is a periodic then there has to be a thread::sleepfor(it_value) followed by a thread::sleepfor(it_interval) until the application exit? I am not sure what to do here.

I also dont see the start() function within the timer.cpp invoking a thread after a timer is disarmed. Is it done in a different place? could you pls also help me here?

shankar-in avatar Nov 09 '20 07:11 shankar-in

@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.

@elfenpiff This can only be done by an Eclipse committer. Eclipse committers are voted on in elections.

mossmaurice avatar Nov 09 '20 07:11 mossmaurice

@mossmaurice Thank you

shankar-in avatar Nov 09 '20 08:11 shankar-in

@shankar-bosch a simple sleep_for is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use the sleep_for.

@elBoberido How about just removing the timer from iceoryx ? I would need to double check but the current use-cases we have within iceoryx can be resolved using a simple sleep_for (e.g. sending heartbeat).

orecham avatar Nov 09 '20 09:11 orecham

@ithier that's the idea of this issue. There is an additional functionality in the timer (checking for a deadline) and that should to be preserved

elBoberido avatar Nov 09 '20 10:11 elBoberido

@shankar-bosch the thread is started implicitly. Have a look at https://linux.die.net/man/2/timer_create, we use SIGEV_THREAD

If you use just sleep_for, the semantic of the code changes. In this case the thread which previously called start is now blocked, while it just continues running with the current timer implementation. You have to create a thread and use sleep_for in a loop. I would suggest to start with cutting out the functionality of the DeadlineTimer to cxx/deadline_timer.hpp/cpp. This should be way easier. You can do it by copying the current timer to cxx and removing everything from it which is related to the m_osTimer member

elBoberido avatar Nov 09 '20 11:11 elBoberido

Here is where the timer is used in iceoryx:

  • sending heartbeat signal -> can be replaced by creating a single extra thread that calls sleep_for
  • waiting for roudi up to a deadline -> can be replaced by saving epoch time and just comparing to the current time each iteration, this does not need another thread

From above, it looks like we only need to do add the heartbeat thread above and write some kind of helper method to wrap the usage of timer_gettime to satsify the second usecase. Then we can just mark the old timer as deprecated and we are done with this issue, right ?

orecham avatar Nov 09 '20 11:11 orecham

it just occurred to me that sleep_for might be really bad for the test time, since it's not interruptible. How about creating a small wrapper around a semaphore with timed wait? e.g.

class Timer {
public:
    Timer() noexcept = default;
    ~Timer() noexcept; // calls stop
    cxx::expected<TimerError> wait(cxx::Duration) noexcept; // blocking wait on the semaphore
    void stop() noexcept; // sem_post to interrupt waiting
    static cxx::expected<units::Duration, TimerError> Timer::now() noexcept; // this is from the old Timer and also used in the codebase
}

Later on the semaphore could be replaced by POSIX timer_create(..) etc, if necessary. What do you think?

elBoberido avatar Nov 09 '20 13:11 elBoberido

Current PR #333 also makes use of the timer. Basically, it's checking for a timeout / interval expiry.

@elBoberido wrapper sounds good to me and could be used for this and @ithier first bullet point.

marthtz avatar Nov 09 '20 14:11 marthtz

@elBoberido Thanks for the wrapper solution.

shankar-in avatar Nov 10 '20 08:11 shankar-in

We should definitely fix this for 1.0, On ARM 64 I have failing tests because of #190 but I fear that's not the only problem

budrus avatar Nov 20 '20 18:11 budrus

@budrus to fix #190 we need to refactor the units::Duration away from long double to chrono or timespec/timeval

elBoberido avatar Nov 20 '20 18:11 elBoberido

@elBoberido That's at least half of the game. The failing DestructorIsBlocking could maybe be another thing. This also fails with larger numbers...

budrus avatar Nov 20 '20 18:11 budrus

@budrus it could still be a rounding issue. AFAIK, the timer doesn't sleep for e.g. 100ms but for currentTime + 100ms. This might result in rounding errors with the long double, especially since that's only an alias to double on ARM and 80 bit wide on x86 ... except for MSVC where it is also an alias to double

elBoberido avatar Nov 20 '20 18:11 elBoberido

@budrus to fix #190 we need to refactor the units::Duration away from long double to chrono or timespec/timeval

@elBoberido @budrus I suggest here to use neither of them. I would create a struct which looks like timespec

struct duration_t {
     uint64_t seconds;
     uint32_t nanoseconds;
};

The reason is that a duration cannot be negative, you cannot wait -5 seconds but timespec/timeval can be negative. Additionally, timespec/timeval do not have a fixed size they are using the types long and time_t which can differ on different architectures.

And when we refactor units::Duration we should get rid of all the long double operators which are absolutely unnecessary - and I know I was the one who introduced them in the first place :/ .

elfenpiff avatar Nov 20 '20 21:11 elfenpiff

I will take the solution for the truncation issue as suggested by @elfenpiff. Thank you. @budrus Is there a seperate ticket for failing DestructorIsBlocking issue?

shankar-in avatar Nov 23 '20 09:11 shankar-in

@shankar-bosch No, as @elBoberido said, it could also have the same root cause as #190. We would have to analyze this but now things are changing anyway

budrus avatar Nov 23 '20 09:11 budrus

@budrus Okay. I believe the issue #190 fix should be available for 1.0. So increasing the scope of #337 to fix #190 and analysing the failing DestructorIsBlocking should be fine?

shankar-in avatar Nov 23 '20 09:11 shankar-in

since we are approaching the 0.90 release, is there any update on this?

elBoberido avatar Dec 17 '20 13:12 elBoberido

Yes the PR is getting ready. Should be raised shortly.

shankar-in avatar Dec 18 '20 08:12 shankar-in

@shankar-bosch for Deadline or keep-alive?

elBoberido avatar Dec 18 '20 10:12 elBoberido

Deadline is ready. Keep alive is under testing. The PR should cover both.

shankar-in avatar Dec 18 '20 12:12 shankar-in

@shankar-bosch do you think you can get this in until monday? If not, I could do a fast fix for the keep alive.

Just as reminder to keep the PRs small. The PR you are preparing could be split into two. The dealine timer and the keep alive functionality.

elBoberido avatar Dec 18 '20 13:12 elBoberido