nuttx
nuttx copied to clipboard
arch/xmc4 Add tickless support
Summary
Add Tickless support of xmc4 chip. Unfortunately the xmc4 CCU (Compare/Capture Unit) can't be used to do a tickless feature via alarm (one freerun timer) because the compare value cannot be update on the flight (see shadow register in the RM). So I used two timers one freerun (for timing) and one oneshot timer (for interval).
Notes :
-
The XMC has 4 CCU and 4 slices of 16bits in each of them that can be concatenated to form a 64bits timer. A the beginning I though this concatenation feature could help me to avoid having max wait delay (or very very long one). But this concatenation doesn't really work as a 64 bits timers. Indeed, for instance, if you concatenate 2 slices. The second slice will increment its counter once the first one has finished its period. Meaning, you have two find A and B to have A * B = Period. This problem seems simple, but 1 equation, 2 unknowns... I have used a simple algorithm to find them (only for 2 * 16 bits) and it took around 80ms to compute.... So even with a big optimization, I think we are far from using this concatenation feature for the tickless.
-
I've inspired by both STM32 and SAM tickless support. The stm32 one is simpler as its uses directly register to start the timers, and I didn't want to write the ccu driver only for the tickless... (contributions are welcomed). And the SAM allow me to compute the prescaler for the timer (sam_tc_divisor stuff).
Impact
Testing
Tested on xmc4800-relax dev board
@leducp @trns1997 @Rdk-T @acassis
Hi everyone,
As you can see in this MR, I'm trying to add the tickless support for xmc4 chip.
However I couldn't make it work, that's why I'd need your help.
Everything seems to work except the interrupt of the interval timer. (For your information, we can't to the alarm implementation with the xmc4, can't update the compare value on the flight...)
The timer starts correctly ( I've checked the value of the counter), BUT the interrupt doesn't get trigger. Why ? because we are inside a critical section. If I put a leave_critical_section(0) when I start the interval timer, the interrupt is triggered BUT then the task is not scheduled (I don't know why).
Moreover, if I put a very small period (I use period match for timer), the interrupt is triggered... No idea why...
I know I'm missing something... I can't figure out what...
Any help is appreciated !
Thanks
@xiaoxiang781216 @raiden00pl @pkarashchenko @patacongo any idea? Inside the critical section we disable IRQ so I think this is by design, but no idea how tickless handles it.
For a bit more info :
I do a usleep(600000) between two Hello world print, and here's the output.
Hello, World!!
up_timer_gettime: usec=349525 ts=(0, 349525000)
up_timer_gettime: usec=367001 ts=(0, 367001000)
up_timer_start: ts=(0, 200000000)
nx_start: CPUxmc4_interval_handler: Expired...
up_timer_start: ts=(0, 410000000)
0: Beginning Idle Loop
The first interval timer interrupt is triggered but no the second one.
For a bit more info :
I do a usleep(600000) between two Hello world print, and here's the output.
Hello, World!! up_timer_gettime: usec=349525 ts=(0, 349525000) up_timer_gettime: usec=367001 ts=(0, 367001000) up_timer_start: ts=(0, 200000000) nx_start: CPUxmc4_interval_handler: Expired... up_timer_start: ts=(0, 410000000) 0: Beginning Idle Loop
The first interval timer interrupt is triggered but no the second one.
@nicolas71640 do you need to change some XMC45's timer register to let the second event to happen? Seems like a hardware issue.
Sorry about wrong closes/reopen the PR.
@nicolas71640 do you need to change some XMC45's timer register to let the second event to happen? Seems like a hardware issue.
No, don't think so. This code works outside of the tickless context.
Plus, the usleep stop the timing interrupt too...
I have found the issue. It was hardware indeed, but not from the side I've looked for... On the xmc, the CCU clocks is by default disable on sleep mode... Didn't know... So the idle task disabled the ccu clock and therefore the timer... I'll push the changes soon !
@nicolas71640 please fix also the coding style issues raised by CI
Hello Everybody,
Thank you for all your feedbacks.
I might be at the end of the road concerning xmc tickless... The performances are far from being acceptable (compared to tick based nuttx). I have done a little benchmark where I start two small thread supposed to do very short tasks (just waking up basically) at two different periods (10ms and 60ms). When the two thread wake up at different time, there is no issue. I have a jitter my resolution. However, when both of them wake up around them time, I can have a 5 or 6 times my resolution as jitter.
I have profiled the scheduling of my threads (with gpio):
- Thread B => Period = 60ms, Priority = 11
- Thread A => Period = 10ms, Priority = 10
- Start => up_timer_start function from beginning of method to return
- Cancel=> up_timer_cancel function from beginning of method to return
- Sleep => clock_nanosleep function from beginning of method to return
- Interrupt = Interval timer interrupt function from beginning of method to return
Here you can see that everything seems all right. Thread A and Thread B seems scheduled accordingly to their periods. However the timings are bad. Here I have 60,183ms for the first period of B, and 59.828ms for the second one (My Thread check if he's late or early, and adjust his sleep accordingly).
If we zoom in here what we can see when the threads are scheduled next to each other :
What I understand from that is that Thread B is delayed by Thread A. From the first interrupt (yellow line) : 1. An interrupt is trigger : the Thread A must be scheduled 2. Start the next timer, to trigger the Thread B (few tens of uS) 3. Start the ThreadA routine 4. The ThreadA gets interrupted by the second interrupt (that we've just started in 2.) 5. Start the next timer, (Must be the delay I've put to end the overall loop) 6. Start the ThreadB routine 7. ThreadB put itself to sleep 8. The scheduler cancel the timer we started in 5. 9. Start the new timer to scheduled the thread B in 60ms 10. Scheduled back the ThreadA that wasn't finished and put itself to sleep 11. Cancel the timer we've started in 9. as this one will be shorter (10ms < 60ms) 12. Start the new one.
This sequence seems to me perfectly normal. Can you confirm ? So if my implementation of the tickless is correct, it means that I'll always have this kind of delay...
Now what's disturbing is : Why the scheduler starts another timer to trigger the thread B (after having trigger the thread A) while it's already late...
Hi @nicolas71640 very nice test. I don't know much about the Tickless implementation, but I think thread A will need to setup a new timer to wake-up thread B, because otherwise will not be any pending event in the system and the scheduler will not wake-up again.
Maybe @patacongo which implemented the Tickless could give more details or someone that have more experience than I with Tickless, like @masayuki2009 or @xiaoxiang781216 can help here.
@nicolas71640 suggestion: please include this graphic and text as documentation to Tickless on XMC4500. I'm sure it will help more people!
@nicolas71640 I would suggest that you utilize the common code to implement timer related up_xxx api: https://github.com/apache/nuttx/blob/master/drivers/timers/arch_alarm.c https://github.com/apache/nuttx/blob/master/drivers/timers/arch_timer.c https://github.com/apache/nuttx/blob/master/drivers/timers/arch_rtc.c so, you can just write timer driver and reuse the code which map up_xxx api to timer driver api: https://github.com/apache/nuttx/blob/master/drivers/timers/oneshot.c https://github.com/apache/nuttx/blob/master/drivers/timers/timer.c https://github.com/apache/nuttx/blob/master/drivers/timers/rtc.c Another benefit is that you just need write the code once, but could use the timer by OS through up_xxx api, or by apps through /dev/xxx api easily. Or at least, you need familiar with arch_xxx.c which could answer your questions.
Hello everybody,
Thanks @xiaoxiang781216 for your answer. I was just about to post the conclusion of my issue. I have improved a bit my diagrams and my understanding of them.
@acassis I am sorry, I won't add anything to the documentation. I have spend enough time for this implementation, and can't spend more...
I will fix the small format issues but I won't go further, as the tickless won't be useful for us. I leave it to your decision if you want to merge this PR or not. Knowing that the implementation works perfectly, more or less copy-paste from the stm32 one.
Tickless Profiling
Here's my test configuration :
Two Threads :
- ThreadA
- Priority = 10
- Period = 10 ms
- ThreadB
- Priority = 11
- Period = 30ms
Note : My Threads implementation doesn't have fix sleep duration. It checks if it's late or early and adapt the sleep duration accordingly.
Tickless
Thread A gets preempted by ThreadB (higher priority)
Here's the description of what's happening :
- The interrupt related to thead A is triggered.
- The scheduler start another timer to triggered the ThreadB
- The Thread A is triggered...
- ... until the interrupt related to threadB is triggered
- The scheduler start another timer : this the overall sleep, that ends then test
- The Thread B is then run, indeed, Priority of B is higher than Priority of thread A
- The Thread B finishes the ThreadA stack is put back to the cpu
Thread A gets tiggered right after ThreadB
Here we can see that, event if the interrupt of thread A has been triggered before the end of ThreadB routine, the delay between the end of ThreadB routine and the start of ThreadA routine is quite huge : 115uS.
With Tick
The "With tick" diagram is much simpler. The Thread that has the higher priority gets triggered first. Then the scheduling take 60uS and then the second thread gets triggered. And there is no preemption.
Comparison with another CPU
I did the same benchmark with the STM32F411. I have the exact same result. Therefore I concluded that these behaviours are not due to my implementation of the tickless.
Conclusion
The tickless feature seems quite useful when you want your CPU to sleep most of the time (without a constant tick). But it cames with two prices :
- you have to start and stop the timer... obviously. And this takes time. On the XMC the delay between the end of a thread and the beginning of the other is around twice the time with ticklless than with tick (120uS against 60uS).
- With a tick, there's no preemption on the thread. Indeed, on the tick, if two threads a are ready to run, the one with the highest priority is scheduled first. A thread with a higher prioriy won't preempt a thread with a lower priority. On the contrary, with tickless, preemption is possible, and therefore there context switches that takes time...
Unfortunately the alarm implementation of the tickless on the XMC is not possible (can't update compare value on the flight), and I have no chip where this implementation is available so I can't compare. But I can imagine that the timings are a bit bette with this implementation as the timer doesn't have to be started/stoped all the time.
@trns1997 @leducp @Rdk-T
@nicolas71640 I suggest you to try Tickless in some architecture that already has it implemented because it works fine without these drawbacks you are seeing. My company uses Tickless and RR and the only issue we found was that UART was losing some received bytes. After further investigation we discovered that the issue was caused by RTC. Everytime the CPU was awaken from sleep the RTC needs to initialize the 32KHz crystal (disabling interrupts) during 400us for the wake-up timer (that is RTC clock/16). It was fixed changing the prescaler from 16 to 2. I saw you said you tried on STM32F411, maybe the issue could be related to something similar what we faced with RTC, but maybe in other timer initialization for example.
I have actually tried the tickless mode on the STM32. I have the exact same behavior. My benchmark (we use an OTSS benchmark for every OS we use), where we have 4 threads with different priorities and different periods, fail also with the STM32F411, having 500uS jitter (with 100uS resolution). Jitter that is 200uS (most of the time 100uS) without tickless. To be totally honest, I haven't generated the diagrams with the STM32 that I have for the XMC, but my benchmark doesn't lie, the jitter is too high... When the Threads don't preempt each other, or that they are far from each other, the jitter is really great though. And can be much more precise than with tick as you can reduce the resolution.
@nicolas71640 Infineon dev boards come with built-in J-Link if I remember correctly, so I recommend using SystemView to debug time related problems. This way you can get much more information and find problems that can't be detected with GPIO-based debugging (for example in this issue https://github.com/apache/nuttx/issues/6012)
Here we can see that, event if the interrupt of thread A has been triggered before the end of ThreadB routine, the delay between the end of ThreadB routine and the start of ThreadA routine is quite huge : 115uS.
You don't mention what clock resolutioin you are using. I presume it is 100 uSec period?
I wonder if this could be a quantization error similar to that described here: https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays . If so, the solution would be increase the timer base frequency.
Other operations can introduce jitter to: Disabling interrupts, locking the scheduler, taking mutexes, etc. There was a good discussion some time back involving Sebastien Lorquet where he found that the mechanism that he used to wake-up tasks could introduce more or less jitte (but I can't remember the details of that. Was that this discussion: https://groups.google.com/g/nuttx/c/x5D0rAyMOdo/m/yg40A5ZkAAAJ ).
Here we can see that, event if the interrupt of thread A has been triggered before the end of ThreadB routine, the delay between the end of ThreadB routine and the start of ThreadA routine is quite huge : 115uS.
You don't mention what clock resolutioin you are using. I presume it is 100 uSec period?
I wonder if this could be a quantization error similar to that described here: https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays . If so, the solution would be increase the timer base frequency.
Other operations can introduce jitter to: Disabling interrupts, locking the scheduler, taking mutexes, etc. There was a good discussion some time back involving Sebastien Lorquet where he found that the mechanism that he used to wake-up tasks could introduce more or less jitte (but I can't remember the details of that. Was that this discussion: https://groups.google.com/g/nuttx/c/x5D0rAyMOdo/m/yg40A5ZkAAAJ ).
@patacongo I think you are right the CONFIG_USEC_PER_TICK is set to default 100uSec. @nicolas71640 I think it might be worth reducing this value to see if it changes anything but tbf the 100uSec period should more than suffice for tests conducted above.
@raiden00pl Very good to know ! I didn't know that nuttx had this integrated ! I might have a look @patacongo @trns1997 Yes, the resolution was 100uS. I have tested our benchmark with 10uS. It's better, but very far from being 10uS precise : with the 4 threads I've mentioned before, the jitter was around 300-400uS. But how it could be different anyway when you see that it takes around 100uS to cancel/stop the timer.... As I said, I haven't traced the diagram with the 4 threads running, but I can imagine multiple preemptions, cancelling/starting the timer 4 times at some moment...
@patacongo I think you are right the CONFIG_USEC_PER_TICK is set to default 100uSec. @nicolas71640 I think it might be worth reducing this value to see if it changes anything but tbf the 100uSec period should more than suffice for tests conducted above.
Okay. But jitter up to the LSB value of 100 Usec is to be expected and cannot be eliminated in any other way.
@raiden00pl Very good to know ! I didn't know that nuttx had this integrated ! I might have a look @patacongo @trns1997 Yes, the resolution was 100uS. I have tested our benchmark with 10uS. It's better, but very far from being 10uS precise : with the 4 threads I've mentioned before, the jitter was around 300-400uS. But how it could be different anyway when you see that it takes around 100uS to cancel/stop the timer.... As I said, I haven't traced the diagram with the 4 threads running, but I can imagine multiple preemptions, cancelling/starting the timer 4 times at some moment...
With a clock of 100 usec, a delay of 10 usec is impossible. In fact, I would expect to see errors up to 200 usec since a tick is added to the delay to de-bias it (see again https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays ). If you want to reduce that jitter you have to improve the clock resolution.
@raiden00pl Very good to know ! I didn't know that nuttx had this integrated ! I might have a look @patacongo @trns1997 Yes, the resolution was 100uS. I have tested our benchmark with 10uS. It's better, but very far from being 10uS precise : with the 4 threads I've mentioned before, the jitter was around 300-400uS. But how it could be different anyway when you see that it takes around 100uS to cancel/stop the timer.... As I said, I haven't traced the diagram with the 4 threads running, but I can imagine multiple preemptions, cancelling/starting the timer 4 times at some moment...
With a clock of 100 usec, a delay of 10 usec is impossible. In fact, I would expect to see errors up to 200 usec since a tick is added to the delay to de-bias it (see again https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays ). If you want to reduce that jitter you have to improve the clock resolution.
@patacongo I think @nicolas71640 meant that he had reduced the resolution to 10usec for the benchmark test. Right @nicolas71640 ?
@patacongo I think @nicolas71640 meant that he had reduced the resolution to 10usec for the benchmark test. Right @nicolas71640 ?
Right. my mistake. There is no overhead to a shortest clock period that you can get. I used to use 1 usec when I could.
100 usec to stop/cancel the timer seems excessive.
@nicolas71640 could you squash the patch into one?
@nicolas71640 please squash the change into one patch and fix the nxstyle issue reported by: https://github.com/apache/nuttx/actions/runs/8233183407/job/22512126340?pr=11737
@nicolas71640 thank you very much for your investigation on these issues and for making things work! Kudos!!!