apollo icon indicating copy to clipboard operation
apollo copied to clipboard

Why cyber timer_component detect time jump?

Open KelvinJiang opened this issue 2 years ago • 4 comments

System information

  • OS Platform and Distribution Ubuntu 20.04.1
  • Apollo installed from binary
  • Apollo version 7.0:

Steps to reproduce the issue:

When using the timing component of Cyber, there is a time jump prompt in the mainboard.INFO. To eliminate the impact of system time jumps, the ntpd and ntpdate services have been stopped. By using the following two official examples, it is easy to generate logs that detect time jumps.

Timer_component

  1. modify cyber/examples/timer_component_example/timer_component_example.cc , delete AINFO.
#include "cyber/examples/timer_component_example/timer_component_example.h"

#include "cyber/class_loader/class_loader.h"
#include "cyber/component/component.h"
#include "cyber/examples/proto/examples.pb.h"

bool TimerComponentSample::Init() {
  driver_writer_ = node_->CreateWriter<Driver>("/carstatus/channel");
  return true;
}

bool TimerComponentSample::Proc() {
  // static int i = 0;
  // auto out_msg = std::make_shared<Driver>();
  // out_msg->set_msg_id(i++);
  // driver_writer_->Write(out_msg);
  // AINFO << "timer_component_example: Write drivermsg->"
  //       << out_msg->ShortDebugString();
  return true;
}
  1. bazel build --config=opt cyber/examples/timer_component_example:libtimer_component_example.so
  2. cyber_launch start cyber/examples/timer_component_example/timer.launch
  3. tail -f /apollo/data/log/mainboard.INFO

talker example

Modify the cyber example /cyber/examples/talker.cc,just keep rate Sleep function.

#include "cyber/examples/proto/examples.pb.h"

#include "cyber/cyber.h"
#include "cyber/time/rate.h"
#include "cyber/time/time.h"

using apollo::cyber::Rate;
using apollo::cyber::Time;
using apollo::cyber::examples::proto::Chatter;

int main(int argc, char *argv[]) {
  // init cyber framework
  apollo::cyber::Init(argv[0]);
  // create talker node
  auto talker_node = apollo::cyber::CreateNode("talker");
  // create talker
  auto talker = talker_node->CreateWriter<Chatter>("channel/chatter");
  Rate rate(500.0);
  while (apollo::cyber::OK()) {
    rate.Sleep();
  }
  return 0;
}

The rate is set to 500.0 Hz,use bazel run bazel run --config=opt cyber/examples:talker to test. watch /apollo/data/log/talker.INFO

Result

The log shows time forward jump:

I1010 11:35:36.897616 2222807 processor.cc:42] processor_tid: 2222807
I1010 11:35:36.897639 2222808 processor.cc:42] processor_tid: 2222808
I1010 11:35:36.897673 2222809 processor.cc:42] processor_tid: 2222809
I1010 11:35:36.897701 2222810 processor.cc:42] processor_tid: 2222810
I1010 11:35:36.897729 2222811 processor.cc:42] processor_tid: 2222811
I1010 11:35:36.897758 2222812 processor.cc:42] processor_tid: 2222812
I1010 11:35:36.897780 2222506 init.cc:113] Register exit handle succ.
I1010 11:35:36.897787 2222813 processor.cc:42] processor_tid: 2222813
W1010 11:35:37.873878 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:35:37.889809 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:35:46.897998 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:36:32.873950 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:36:38.197471 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:36:38.233948 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:37:08.392994 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:37:43.494035 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:38:08.745947 2222506 rate.cc:96] Detect forward jumps in time
W1010 11:38:38.841961 2222506 rate.cc:96] Detect forward jumps in time

I would like to know the cause of this time jump. Even if only one Cyber process is running, the time jump still occurs, so it should be unrelated to the system load. This time jump can make the program appear to suddenly freeze. It will affect the logical judgment of the business.

I modify the rate.h and rate.cc, to find the detail.

#ifndef CYBER_TIME_RATE_H_
#define CYBER_TIME_RATE_H_

#include "cyber/time/duration.h"
#include "cyber/time/time.h"

namespace apollo {
namespace cyber {

class Rate {
 public:
  explicit Rate(double frequency);
  explicit Rate(uint64_t nanoseconds);
  explicit Rate(const Duration&);
  void Sleep();
  void Reset();
  Duration CycleTime() const;
  Duration ExpectedCycleTime() const { return expected_cycle_time_; }

 private:
  Time start_;
  Time pre_start_;
  Duration expected_cycle_time_;
  Duration actual_cycle_time_;
};

}  // namespace cyber
}  // namespace apollo

#endif  // CYBER_TIME_RATE_H_
#include "cyber/time/rate.h"
#include "cyber/common/log.h"

namespace apollo {
namespace cyber {

Rate::Rate(double frequency)
    : start_(Time::Now()),
      pre_start_(start_),
      expected_cycle_time_(1.0 / frequency),
      actual_cycle_time_(0.0) {}

Rate::Rate(uint64_t nanoseconds)
    : start_(Time::Now()),
      pre_start_(start_),
      expected_cycle_time_(static_cast<int64_t>(nanoseconds)),
      actual_cycle_time_(0.0) {}

Rate::Rate(const Duration& d)
    : start_(Time::Now()), 
      pre_start_(start_),
      expected_cycle_time_(d), 
      actual_cycle_time_(0.0) {}

void Rate::Sleep() {
  Time expected_end = start_ + expected_cycle_time_;

  Time actual_end = Time::Now();

  // detect backward jumps in time
  if (actual_end < start_) {
    expected_end = actual_end + expected_cycle_time_;
    AWARN << "Detect backward jumps in time,"
          << " start_: " << start_
          << " expected_cycle_time_: " << expected_cycle_time_
          << " expected_end: " << expected_end
          << " actual_end: " << actual_end;
  }

  // calculate the time we'll sleep for
  Duration sleep_time = expected_end - actual_end;

  // set the actual amount of time the loop took in case the user wants to kNow
  actual_cycle_time_ = actual_end - pre_start_;

  // make sure to reset our start time
  pre_start_ = start_;
  start_ = expected_end;

  // if we've taken too much time we won't sleep
  if (sleep_time < Duration(0.0)) {
    AWARN << "Detect forward jumps in time,"
          << " expected_end: " << expected_end
          << " actual_end: " << actual_end 
          << " expected_cycle_time_: " << expected_cycle_time_
          << " actual_cycle_time_: " << actual_cycle_time_
          << " sleep_time: " << sleep_time;
    // if we've jumped forward in time, or the loop has taken more than a full
    // extra cycle, reset our cycle
    if (actual_end > expected_end + expected_cycle_time_) {
      AWARN << "Reset start_ time due to actual_end: " << actual_end 
            << " > (expected_end + expected_cycle_time_): " << expected_end + expected_cycle_time_;
      start_ = actual_end;
    }
    // return false to show that the desired rate was not met
    return;
  }

  Time::SleepUntil(expected_end);
}

void Rate::Reset() { start_ = Time::Now(); }

Duration Rate::CycleTime() const { return actual_cycle_time_; }

}  // namespace cyber
}  // namespace apollo

the log shows:

W1010 14:57:56.312966 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:57:56.311983509 actual_end: 2023-10-10 14:57:56.312945266 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004961757s sleep_time: -0.000961757s
W1010 14:58:13.512583 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:13.511983509 actual_end: 2023-10-10 14:58:13.512572935 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004589426s sleep_time: -0.000589426s
W1010 14:58:30.620057 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:30.619983509 actual_end: 2023-10-10 14:58:30.620047829 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004064320s sleep_time: -0.000064320s
W1010 14:58:30.653992 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:30.653983509 actual_end: 2023-10-10 14:58:30.653984063 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004000554s sleep_time: -0.000000554s
W1010 14:58:31.858386 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:31.857983509 actual_end: 2023-10-10 14:58:31.858376229 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004392720s sleep_time: -0.000392720s
W1010 14:58:36.962591 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:36.961983509 actual_end: 2023-10-10 14:58:36.962579108 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004595599s sleep_time: -0.000595599s
W1010 14:58:45.704855 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:45.697983509 actual_end: 2023-10-10 14:58:45.704844747 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.010861238s sleep_time: -0.006861238s
W1010 14:58:45.704902 2234991 rate.cc:115] Reset start_ time due to actual_end: 2023-10-10 14:58:45.704844747 > (expected_end + expected_cycle_time_): 2023-10-10 14:58:45.699983509
W1010 14:58:47.168917 2234991 rate.cc:106] Detect forward jumps in time, expected_end: 2023-10-10 14:58:47.168844747 actual_end: 2023-10-10 14:58:47.168907518 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.004062771s sleep_time: -0.000062771s

KelvinJiang avatar Oct 10 '23 07:10 KelvinJiang

I wrote a python program to detect time jump:

import datetime
import time

while True:
     current_time = datetime.datetime.now()
     print(current_time)
     time.sleep(0.0001)

When the cyber rate log detect time jump.

W1026 13:44:39.377957 683902 rate.cc:52] Detect forward jumps in time, expected_end: 2023-10-26 13:44:39.370923089 actual_end: 2023-10-26 13:44:39.377936730 expected_cycle_time_: 0.002000000s actual_cycle_time_: 0.011013641s sleep_time: -0.007013641s

The time log detect timer jump forward. I think it's not cyber problem. I will check my enviroment.

2023-10-26 13:44:39.367073
2023-10-26 13:44:39.367235
2023-10-26 13:44:39.367399
2023-10-26 13:44:39.367562
2023-10-26 13:44:39.377961
2023-10-26 13:44:39.378156

KelvinJiang avatar Oct 26 '23 05:10 KelvinJiang

This is because your task execution time is longer than the timing period, causing the above phenomenon to occur. For example

You task run 120ms > Timer period is 100ms

daohu527 avatar Oct 27 '23 07:10 daohu527

Thanks for replay. My task is an empty func,just return.

KelvinJiang avatar Oct 27 '23 07:10 KelvinJiang

I tried the above example and the problem you mentioned did not occur.

daohu527 avatar Oct 30 '23 02:10 daohu527