amazon-kinesis-video-streams-webrtc-sdk-c
amazon-kinesis-video-streams-webrtc-sdk-c copied to clipboard
[BUG] Monotonic time should be used for calculating the durations such as timeouts
Describe the bug
For calculation of timeouts, GETTIME
function is used.
GETTIME
is defined as globalGetTime
function.
As it can be seen in https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/utils/src/Time.c#L45 this function reads the time using CLOCK_REALTIME
as the clock id when calling clock_gettime
. This is the wall clock and it can go forward and back with NTP synchronizations. Using this time for calculating the time passed will result in unexpected behavior.
Instead, time should be read using CLOCK_MONOTONIC
as the clock id when calling clock_gettime
for the time information that will be used to measure the time passed.
Please refer to clock_gettime man page for more details. This discussion can also be used as a reference: https://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and-clock-monotonic
SDK version number 1.7.1 (and all previous)
Expected behavior Timeouts should be calculated accurately.
Desktop (please complete the following information):
- OS: Linux
Historically, it was a single clock definition - GETTIME which could be mapped to anything (or overridden like in case of testing or other runtimes). Later, a new time function was added - GETREALTIME https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/utils/src/Time.c#L4 which is used to measure durations with precision. It was supposed to be mapped to the monotonic function but the latest code shows it points to the same realtime. This one is used in the CVAR wait functionality for example. I believe the default time implementation needs to be revisited again. The main points to take though is that the time functionality can be overridden easily by setting globalGetTime function pointer like it's originally done in the codebase:
https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/9ef93e5a41465ec0c847b8e1ee30ac9ba29fdb7f/src/utils/src/Time.c#L3
The other thing to remember is that other runtimes would enforce usage of a particular implementation - for example Java through JNI. And the third point is that the time should be agreed with the auth token retrieved from the backend in order to communicate.
Yes, I believe the time used for some functionality like communicating with backend should still be the wall time but the time used in calculating timeouts (for instance https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/master/src/source/Ice/TurnConnection.c#L215) should be the monotonic clock.
Should we expect an update in the SDK soon to address this?
I agree it should. Ideally, there should be a new entry in common defs called GETMONOTONICTIME which defaults to the monotonic clock and this is what should be used instead of repurposing the GETTIME. That one is used in many other areas and has far deeper fetching implications.
This said, there are two types of clock drifts that happen.
- Multi-core CPU scheduled applications can experience slight time drift backwards when the cores have not yet synchronized.
- The system adjusts the real clock based on some input like NTP server.
The turn connection case above will not be susceptible to the first case as the clock drift is generally not more than a few microseconds. However, the second case is a real issue if your platform is set up with a periodic NTP server adjustment and you have a bad clock on the device. Not sure if this is the case or whether this is a real issue you see in the production.
I can't comment about the implementation - perhaps someone from AWS can.
@tuncalie ,
I agree we should be using MONOTONIC. As far as an update on when this will be done, I will update the ticket after internal discussion.
Before the bot closes this issue, yes this is still happening.
This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.
This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.