inet icon indicating copy to clipboard operation
inet copied to clipboard

Sharp spikes appear on the time synchronization accuracy curve of the time synchronization protocol in TSN

Open rhornig opened this issue 5 months ago • 4 comments

Originally posted by @feichangniub July 7, 2025 1、 Show the question

The case used in this discussion is the first example of gPTP in TSN. The following figure is the original result chart image

2、analyze the timestamp

As shown in the above figure, it was found in the simulation of the time synchronization protocol that there would be propagation delay calculation errors (sawtooth phenomenon) during the entire second (such as 2s).This situation is definitely not right. From the above figure, it can be seen that no jagged edges were found at 1s in device 1. Analyze the timestamps of pdelayreq and pdelayresp at this time. The sending time of pdelayReq for Device1 is 999.992571296ms+118.499ns=0.999992689795s. The former on the left side of the equal sign is the time of the master clock, and the latter is the difference time between Device1 and the master clock. The right side of the equal sign is the time of the local clock, which is also the time carried on the timestamp of Device1's sending; Switch received the Req at 999.992621296ms+118.424ns=0.99999293972s; The sending time for Switch to reply with resp to device 1 is 1.000007021332s+118.427ns=1.000007139759s, Device1 received Resp at a time of 1.000007071332s+118.503ns=1.000007189835s; image The timestamp records observed through simulation are completely consistent with the theoretical timestamp values. It can be seen from the difference value with the master clock that the time when switch and device 1 perform the above signaling grouping and stamping is before each receives the follow-up frame transmitted by the upper level and completes a local clock synchronization adjustment (because the accuracy reaches ps level when the local synchronization takes effect after receiving the follow-up frame). At 2 seconds, aliasing occurs. Analyze the timestamps of pdelayreq and pdelayresp for device 1 at this time. The sending time of pdelayReq in Device1 is 2.000001239097s+261.687ns=2.000001500784s, The Switch received the Req at a time of 2.000001289097s+261.7ns=2.000001550797s, The sending time for Switch to reply with Resp to device 1 is 2.000015689099s+12ps=2.000015689111s, Device1 received the Resp at a time of 2.0000157391s+261.719ns=2.00001600819s, The following figure shows the actual display time of switch and Device1 in the simulation: image The timestamp time we manually calculated matches the timestamp time of the simulation run perfectly.

3、analyze the problem

At this point, a problem was discovered. At 2 seconds, the error in both the req sending and receiving clocks was around 261.7 ns, indicating a significant error. This indicates that when Device 1 sends and Switch receives req frames, they have not yet received the current synchronization cycle follow-up frame and have not completed the adjustment of the local clock. But when the switch sends the resp, it has already received the follow up frame and made this synchronization adjustment, with an adjusted error of only 12ps; while at the receiving end, device 1 has not yet received the follow up frame, so there is still an error, resulting in aliasing. The situation where no aliasing occurs is that the transmission and reception of req or resp must be before or after the follow up frame, and there cannot be a one before one situation, otherwise it will cause propagation delay errors. Therefore, how to ensure that the sending and receiving of req and resp frames are in the same state (i.e. both have received follow-up frames or neither have received follow-up frames, before or after the local clock synchronization adjustment). And from the data obtained above, it can be found that when sending and receiving req data frames, they are both in a state of not receiving follow-up frames; But in terms of handling resps, when sending resps, the switch has already received a follow up frame, so we need to make sure that device 1 receives the follow up frame as early as possible (before the receipt time of resps), that is, to make the difference between the two modules and the clock at the same order of magnitude. And the parameter pdelayreqprocessingtime is used to correct this situation. This parameter avoids the above situation by increasing the time interval between receiving req and replying resp frames in the simulation. After modifying it, a new time synchronization accuracy result graph can be obtained, as shown in the following figure. image The new time synchronization accuracy chart will no longer have jagged edges.

rhornig avatar Jul 07 '25 10:07 rhornig

@feichangniub please rescuscribe here.

rhornig avatar Jul 07 '25 11:07 rhornig

sorry.I don't quite understand what you mean. Do you want me to repost it in this place?

@feichangniub please rescuscribe here.

feichangniub avatar Jul 07 '25 13:07 feichangniub

@feichangniub he means you should subscribe to this issue here. There's a button on the right side. This issue has been migrated from the omnetpp discussion page.

Could you please add the exact steps and INET version to reproduce the original problem? Also, it's not clear to me how could you fix the problem? Did you set some parameters to a specific value, or did you change the code?

levy avatar Jul 08 '25 09:07 levy

@feichangniub he means you should subscribe to this issue here. There's a button on the right side. This issue has been migrated from the omnetpp discussion page.

Could you please add the exact steps and INET version to reproduce the original problem? Also, it's not clear to me how could you fix the problem? Did you set some parameters to a specific value, or did you change the code?

The inet version is 4.5.4, and the path is showcases/tsn/timesynchronization/gptp. For the first simulation case of gptp, the default setting will show the initial situation. The timestamp in the figure is a screenshot taken from the simulation interface. The value at the beginning of the calculation process represents the simulation time when the corresponding data frame is sent out, and the value added afterwards represents the error value (i.e. the error value between each node and the master clock). I changed a parameter named pdelayreqprocessingtime; The reason for this problem is that the time difference between the two nodes that send and receive the same data frame and the master clock is not of the same order of magnitude (switch has already received the follow up frame and completed synchronization, while device 1 has not received the follow up frame). This difference of magnitude will be introduced into the time accuracy. By using this parameter to move the sending time of the pdelayrespfollowup frame backwards, device 1 will also receive the follow up frame and complete synchronization, which will solve the problem.

feichangniub avatar Jul 08 '25 12:07 feichangniub