cuckoo_time_translator icon indicating copy to clipboard operation
cuckoo_time_translator copied to clipboard

Tuning CCT

Open rikba opened this issue 5 years ago • 2 comments

Hi @HannesSommer

Thanks again for your library. We ran cct_introspect.py to visualize the performance of our time translation.

Our system consists of various sensors, event stamped on an Arduino 101, and transmitted via USB to a Braswell CPU: findmine_sensor_pod

We use a single DefaultDeviceTimeUnwrapperAndTranslatorWithTransmitTime where we set the transmit time before sending the first byte of any sensor message and set the receive time just before receiving the first byte of any sensor message.

Running

ctt_introspect.py sensors_2019-03-08-11-22-05.bag -t /moa/device_time -b 'LeastSquares' --dontPlotPreTranslated -f 'ConvexHullOwt(switchTime = 100), ConvexHullOwt(switchTime = 50), ConvexHullOwt(switchTime = 25), KalmanOwt()'
Analyzing topic :/moa/device_time
Baseline owt after translation: LeastSquaresOwt(): offset=1552038417.750223, skew=0.000001
After translating: SwitchingOwt(switchingTimeSecs=100, currentOwt=ConvexHull()): currentOwt:offset=1552038417.769927, skew=0.999996, stackSize=7, pendingOwt:offset=1552038417.785544, skew=0.999990, stackSize=7
After translating: SwitchingOwt(switchingTimeSecs=50, currentOwt=ConvexHull()): currentOwt:offset=1552038417.770817, skew=0.999995, stackSize=9, pendingOwt:offset=1552038417.785544, skew=0.999990, stackSize=8
After translating: SwitchingOwt(switchingTimeSecs=25, currentOwt=ConvexHull()): currentOwt:offset=1552038417.769901, skew=0.999996, stackSize=7, pendingOwt:offset=1552038417.785544, skew=0.999990, stackSize=8
Warning: KalmanOwt: local_time=1552040532.528224, remote_time=2114.766133 -> measurement_residual=0.00311399, mahal_distance=1.36431!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040599.746658, remote_time=2181.986240 -> measurement_residual=0.0022831, mahal_distance=1.12423!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040652.929971, remote_time=2235.168469 -> measurement_residual=0.00337052, mahal_distance=1.66291!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040661.967342, remote_time=2244.206897 -> measurement_residual=0.00228477, mahal_distance=1.127!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040735.233943, remote_time=2317.468950 -> measurement_residual=0.00652099, mahal_distance=3.2173!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040787.436264, remote_time=2369.674429 -> measurement_residual=0.00316882, mahal_distance=1.56341!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040802.987965, remote_time=2385.226795 -> measurement_residual=0.00240159, mahal_distance=1.18473!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040835.107295, remote_time=2417.344354 -> measurement_residual=0.00386286, mahal_distance=1.90579!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040934.482291, remote_time=2516.712349 -> measurement_residual=0.00975299, mahal_distance=4.8119!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040941.500742, remote_time=2523.737822 -> measurement_residual=0.00271297, mahal_distance=1.33819!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040983.140631, remote_time=2565.374523 -> measurement_residual=0.00595856, mahal_distance=2.93981!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552040993.170724, remote_time=2575.408034 -> measurement_residual=0.0026269, mahal_distance=1.2958!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
Warning: KalmanOwt: local_time=1552041023.290523, remote_time=2605.527826 -> measurement_residual=0.00289035, mahal_distance=1.42596!
         at line 81 in /home/rikba/catkin_ws/src/cuckoo_time_translator/cuckoo_time_translator_algorithms/src/KalmanOwt.cpp
After translating: Kalman(updateCooldownSecs=0.5, sigmaSkew=2e-06, sigmaOffset=0.002, sigmaInitSkew=0.001, sigmaInitOffset=0.002, outlierThreshold=1): offset=1552038417.759727, skew=-0.000006, dt=0.500076
Deviation from base line:
receive times: mean=-0.000237 ms, std=0.664349 ms
ConvexHullOwt(switchTime=100): mean=-0.873163 ms, std=0.559993 ms
ConvexHullOwt(switchTime=50): mean=-0.760489 ms, std=0.452352 ms
ConvexHullOwt(switchTime=25): mean=-0.734920 ms, std=0.425684 ms
KalmanOwt(): mean=-0.023703 ms, std=0.460371 ms

on one of our datasets gives the following plot cct_introspect

Given this plot our conclusions are:

  1. The random delay is significant (up to 20ms) and CCT should definitely be used
  2. We should use either ConvexHullOwt(switchTime=25) or KalmanOwt() with its default tuning as those two adapt nicely to the changing offset and skew.

Are these conclusions correct? Which filter would you recommend?

rikba avatar Mar 12 '19 20:03 rikba

@rikba , thank you for this beautiful plot of a really bad clock (skew changes) and transport randomness - this looks like the perfect justification for this library ;).

Yes, I agree with both 1. and 2.. You might additionally want to try an even shorter living ConvexHullOwt, such as 20s or 15s (these are really crazy numbers). Given the p lot I would actually go for a short living ConvexHullOwt. As you can nicely see, the Kalman alternative (with its current parameters) changes its mean offset from the minimum delay boundary (lower bounds of the red curve) over longer periods of time significantly: its smaller when the lower bound is rising and larger when it is falling and the difference seems to be around 2 ms. This would be a systematic error for some time you cannot calibrate away. Systematic errors (over longer periods) are less extreme on the CH and you can make them even smaller with a slightly shorter switching period at the expected expense of slightly higher short term errors. Which you pick between these two should depend on what you do with the timestamps later.

As for what this says about the underlying problems, things are a bit less clear -- unfortunately. The random delays could be the USB delays ('assuming it still gos through a USB -- couldn't fully figure that out from the image) But it could also be stemming (partially) from some buffering on receiver or transmitter side (I'm not sure about both, the hardware_.write() and the AsyncReadBuffer). So maybe one can reduce those with a little digging. Of course thanks to the transmit timestamp you can correct for them quite nicely with the timestamp translation approach. A delay of 20ms however, can still affect your system because you still have to wait this extra delay for your data, which may or may not be a problem.

The quite fast change of skew could indicate larger temperature changes on the micro controller. Can you easily access the temperature? I could imagine that the changing air currents and possibly the changing (relative) sun bearing can change the temperature quite quickly for drones.

HannesSommer avatar Mar 12 '19 21:03 HannesSommer

Thanks again. I hope it's also a good reference for other users.

really bad clock I hope the bad clock is not the only issues. Because otherwise the whole setup is kind of bad.

I'll try and document here the following things to seperate issues on the setup.

  • [ ] Disable chrony on the host machine to avoid jumps (I enabled initstepslew) and changing skew.
  • [ ] Send only a small message (event_stamp + transmission_stamp) over the serial connection to seperate problems with buffering.
  • [ ] Change from USB to UART for serial connection between Braswell and Arduino 101.
  • [ ] Increase process priority on the serial readout.

rikba avatar Mar 14 '19 13:03 rikba