amazon-kinesis-video-streams-webrtc-sdk-c icon indicating copy to clipboard operation
amazon-kinesis-video-streams-webrtc-sdk-c copied to clipboard

[BUG] Monotonic time should be used for calculating the durations such as timeouts

Open tuncalie opened this issue 2 years ago • 4 comments

Describe the bug For calculation of timeouts, GETTIME function is used. GETTIME is defined as globalGetTime function. As it can be seen in https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/utils/src/Time.c#L45 this function reads the time using CLOCK_REALTIME as the clock id when calling clock_gettime. This is the wall clock and it can go forward and back with NTP synchronizations. Using this time for calculating the time passed will result in unexpected behavior.

Instead, time should be read using CLOCK_MONOTONIC as the clock id when calling clock_gettime for the time information that will be used to measure the time passed.

Please refer to clock_gettime man page for more details. This discussion can also be used as a reference: https://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and-clock-monotonic

SDK version number 1.7.1 (and all previous)

Expected behavior Timeouts should be calculated accurately.

Desktop (please complete the following information):

  • OS: Linux

tuncalie avatar Mar 02 '22 20:03 tuncalie

Historically, it was a single clock definition - GETTIME which could be mapped to anything (or overridden like in case of testing or other runtimes). Later, a new time function was added - GETREALTIME https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/utils/src/Time.c#L4 which is used to measure durations with precision. It was supposed to be mapped to the monotonic function but the latest code shows it points to the same realtime. This one is used in the CVAR wait functionality for example. I believe the default time implementation needs to be revisited again. The main points to take though is that the time functionality can be overridden easily by setting globalGetTime function pointer like it's originally done in the codebase:

https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/9ef93e5a41465ec0c847b8e1ee30ac9ba29fdb7f/src/utils/src/Time.c#L3

The other thing to remember is that other runtimes would enforce usage of a particular implementation - for example Java through JNI. And the third point is that the time should be agreed with the auth token retrieved from the backend in order to communicate.

MushMal avatar Mar 02 '22 21:03 MushMal

Yes, I believe the time used for some functionality like communicating with backend should still be the wall time but the time used in calculating timeouts (for instance https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/master/src/source/Ice/TurnConnection.c#L215) should be the monotonic clock.

Should we expect an update in the SDK soon to address this?

tuncalie avatar Mar 02 '22 22:03 tuncalie

I agree it should. Ideally, there should be a new entry in common defs called GETMONOTONICTIME which defaults to the monotonic clock and this is what should be used instead of repurposing the GETTIME. That one is used in many other areas and has far deeper fetching implications.

This said, there are two types of clock drifts that happen.

  1. Multi-core CPU scheduled applications can experience slight time drift backwards when the cores have not yet synchronized.
  2. The system adjusts the real clock based on some input like NTP server.

The turn connection case above will not be susceptible to the first case as the clock drift is generally not more than a few microseconds. However, the second case is a real issue if your platform is set up with a periodic NTP server adjustment and you have a bad clock on the device. Not sure if this is the case or whether this is a real issue you see in the production.

I can't comment about the implementation - perhaps someone from AWS can.

MushMal avatar Mar 03 '22 02:03 MushMal

@tuncalie ,

I agree we should be using MONOTONIC. As far as an update on when this will be done, I will update the ticket after internal discussion.

disa6302 avatar Mar 03 '22 19:03 disa6302

Before the bot closes this issue, yes this is still happening.

tuncalie avatar Nov 19 '22 16:11 tuncalie

This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

github-actions[bot] avatar May 19 '23 00:05 github-actions[bot]

This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

github-actions[bot] avatar Nov 16 '23 00:11 github-actions[bot]