BIGTREETECH-TouchScreenFirmware icon indicating copy to clipboard operation
BIGTREETECH-TouchScreenFirmware copied to clipboard

Relax OS_Time IRQHandler

Open kisslorand opened this issue 1 year ago • 38 comments

Requirements

BTT or MKS TFT.

Description

This PR is an attempt to ease up the OS timer IRQ handler. Major credit goes to @rondlh who brought it to our attention here: https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/issues/2832. More than that @rondlh made several suggestions to make the IRQ handler lighter on the MCU, suggestions that are included in this PR. The extra changes are:

  • remove modulo operation which is expensive and replace it with bare simple arithmetic ones
  • ~remove the usage of any conditional and ternary from one of the functions called in that interrupt handler and replaced it with bare arithmetic operations~
  • moved the interrupt flag reset to the end of the operations to avoid any unpredictable behaviour of the IRQ handler

Benefits

In theory it speeds up the TFT.

kisslorand avatar Sep 10 '23 13:09 kisslorand

Brilliant improvement for updatePrintTime, but why did you make void loopTouchScreen 2 to 3 times slower than before? Sabotage?

rondlh avatar Sep 10 '23 14:09 rondlh

Sorry, my sense of humor might be very different from yours, I really cannot decide it was meant as a joke or you are asking for real.

kisslorand avatar Sep 10 '23 15:09 kisslorand

Sorry, my sense of humor might be very different from yours, I really cannot decide it was meant as a joke or you are asking for real.

I'm asking for real, your implementation is 2-3 times slower than before, the sabotage part is obviously a joke.

rondlh avatar Sep 10 '23 15:09 rondlh

Can you pleased share the benchmark you used to measure the performance decrease? All I see is performance increase there.

kisslorand avatar Sep 10 '23 15:09 kisslorand

Can you pleased share the benchmark you used to measure the performance decrease? All I see is performance increase there.

Show me... I theoretically can see your implementation is much slower

rondlh avatar Sep 10 '23 15:09 rondlh

Show what?

  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);

Are we talking about these 2 lines?

kisslorand avatar Sep 10 '23 15:09 kisslorand

Yes, I'm talking about loopTouchScreen. You say you see performance increase, what did you see/find? What test? Any numbers?

rondlh avatar Sep 10 '23 15:09 rondlh

How did you come to the conclusion it is 2-3 slower?

Just to clarify: there's no sabotage intention, I spent quite some time to try to increase the performance of that function.

kisslorand avatar Sep 10 '23 15:09 kisslorand

How did you came to the conclusion it is 2-3 slower?

Like I said, theoretically I can see that. But you said:

All I see is performance increase there.

So the question is... how did you see performance improvement?

rondlh avatar Sep 10 '23 15:09 rondlh

I have the same question, how do you see performance decrease?

kisslorand avatar Sep 10 '23 15:09 kisslorand

There is "an issue" in your code, similar to the issue in "ADVANCED_OK", that stops it from being efficient.

rondlh avatar Sep 10 '23 15:09 rondlh

You said there's no need for benchmarks, but you just intrigued me. I will make some benchmarks and will come back with the results. It might take an hour or so.

kisslorand avatar Sep 10 '23 15:09 kisslorand

OK, take it easy. May I ask what nationality you are? (I'm Dutch)

rondlh avatar Sep 10 '23 15:09 rondlh

I am already on it. ;)

kisslorand avatar Sep 10 '23 15:09 kisslorand

I have the results. :) CodeBattle

kisslorand avatar Sep 10 '23 15:09 kisslorand

How did you test? Can you show me the code?

rondlh avatar Sep 10 '23 16:09 rondlh

Actually the two codes compiled do perform the same, the compiler does its job quite well. The difference above could be a rounding issue as the timer is counting microseconds (I did more tests and the results were identical for both code versions). The previous test were done on a Cortex-M4 MCU, I have the results on a Cortex-M3 MCU also:

CodeBattle1

I'm aware that it is harder to understand the code I wrote, I have no issue undoing it and keeping the original code since I know now that the compiler knows how to optimize it.

This test was interesting for me, I wasn't aware that a Cortex-M4 MCU is so much faster than a Cortex-M3.

kisslorand avatar Sep 10 '23 16:09 kisslorand

May I ask what nationality you are? (I'm Dutch)

There's no secret, I am Hungarian. You could already tell that since I made many additions/changes to the Hungarian language pack.

kisslorand avatar Sep 10 '23 19:09 kisslorand

My tests show that your loopTouchScreen is only 25% slower than the original code (STM32F207), not the 2-3 times I theoretically expected. Only code readability has decreased by a factor of 2-3.

You can make loopTouchScreen a tiny bit faster by dropping the guard inversion, and swapping the normal and else case. This trick can also be used elsewhere in the code, but the benefit is very small. The DMA serial writing I have under test has significantly more impact.

Slightly faster code:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (XPT2046_Read_Pen())
  {
    touchScreenIsPress = false;
    touch = 0;
  }
  else
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
    }
    else
    {
      touch++;
    }
  }
}

rondlh avatar Sep 11 '23 01:09 rondlh

What speed increase did your test show for your proposed code?

kisslorand avatar Sep 11 '23 04:09 kisslorand

What speed increase did your test show for your proposed code?

A few percent only.

Your modulo calculation replacement using a counter is faster. Usually you should count down if you do this, not up, but for modern compilers it might be irrelevant. For the TFT performance it has virtually no impact. Confirmed: Counting down is about 15% faster than counting up (only considering the counting part).

So overall, the conclusions in this PR: Cons:

  • loopTouchScreen has become about 25% slower
  • loopTouchScreen is completely unreadable and unmaintainable
  • most of the changes are just copied from #2824 Pros
  • TIM7_IRQHandler is a bit faster (still not optimal)
  • Some unused code was removed

rondlh avatar Sep 11 '23 05:09 rondlh

Opinions are always welcome but taken with a grain of salt.

kisslorand avatar Sep 11 '23 19:09 kisslorand

Where is the opinion in this? I only see facts.

  • loopTouchScreen has become about 25% slower

This is a fact, even your own tests show that at best loopTouchScreen has not become faster, only more unreadable. That your code is slower was obvious to me immediately when I saw it, although I overestimated the speed decrease that it causes. Look again at the code... it's obvious! If you still don't see it then I can show it to you.

  • loopTouchScreen is completely unreadable and unmaintainable

Fact, you also agree to this above.

  • most of the changes are just copied from

Very likely, or perhaps you came up with the exact same idea a day after @digant73 did

  • TIM7_IRQHandler is a bit faster (still not optimal)

Fact, like I showed you above, you need to count down, not up to be faster.

  • Some unused code was removed

Fact (+43 −293), but if you deny it I'm willing to take your side.

Your code should not be merged in it's current state! I don't understand that you keep "force pushing" a merge while being aware of the issues in your code.

BTW: Force-pushing slower and less readable code is sabotage in my book

rondlh avatar Sep 12 '23 13:09 rondlh

Can somebody please review this code change force-pushed by @kisslorand? Please consider execution speed and code readability. In my view it's obvious, but I still did a benchmark which confirmed my suspicions. This is a routine handling the touch screen presses of the TFT and is executed 1000x a second.

ORIGINAL CODE:

static volatile bool touchScreenIsPress = false;
void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (!XPT2046_Read_Pen())
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
	  lastTouchTime = OS_GetTimeMs(); // IRON, ADDED TO RECORD TOUCH TIME
    }
    else
    {
      if (OS_GetTimeMs() - lastTouchTime > 250) // IRON, PREVENT TOO FAST TOUCHING
      touch++;
    }
  }
  else
  {
    touchScreenIsPress = false;
    touch = 0;
  }
}

@kisslorand VERSION: (comments removed that hold the original version, I wonder why those were left in the code)

static volatile bool touchScreenIsPress = false;
void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);
}

UPDATE: @kisslorand has seen the light and updated loopTouchScreen to:

void loopTouchScreen(void)  // Handle in interrupt
{
  if (XPT2046_Read_Pen() == LOW)
    touchCountdown -= !!touchCountdown;
  else
    touchCountdown = TOUCH_DEBOUNCE_MS;
}

Eliminating a variable (touchScreenIsPress) makes sense as it basically represented the same thing as touch was. It's a bit of a dirty trick with touchCountdown, but this should work fine and is a bit faster, good job.

rondlh avatar Sep 12 '23 14:09 rondlh

@kisslorand has seen the light

Actually what I've seen is the rookie mistake I made benchmarking the original function and mine. In the benchmark I didn't use the variables from the functions benchmarked, the compiler optimized the code and eliminated the unused variables. A lesson for everybody, you are not benchmarking the code you wrote, you are benchmarking the code the compiler generates!

After the realization of my mistake I redid the tests, the results are the following:

Benchmark

Where "My new code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
  if (XPT2046_Read_Pen() == LOW)
    touchCountdown -= !!touchCountdown;
  else
    touchCountdown = TOUCH_DEBOUNCE_MS;
}

"My old code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);
}

"Ori code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
    static uint8_t touch;
    if (!XPT2046_Read_Pen())
    {
      if (touch >= 20)  // 20ms
      {
        touchScreenIsPress = true;
      }
      else
      {
        touch++;
      }
    }
    else
    {
      touchScreenIsPress = false;
      touch = 0;
    }
}

"Dumb code" is same as "Ori code" but variables not being used, it can be observed how the optimization of compiler the kicks in.

So @rondlh was right, my old code was 20% slower than the original code (not 25% as he stated, but who knows, maybe my tests are still inaccurate).

I have yet to verify the up/down count speed theory, although it is about the comparison to 0 (zero) and not the direction of the count. The specialized literature mentions that it was a thing in the 1980s but modern compilers and MCUs do not exhibit this behaviour but better test it, right?

Note: negative criticism is good as long as it is done in a civilized manner. Github also have this rule: "criticize the idea, not the person".

kisslorand avatar Sep 13 '23 20:09 kisslorand

Up/Down count and comparison to zero results:

Count

Upcount: count from 0 to X Downcount: count from X + 10 down to 10 CompToZero: count from X down to 0

The difference is clearly from comparison to 0 (zero), not from the count direction. The results of @rondlh for down count being more faster than up count is probably because he tested a down count to 0 (zero). It is the comparison to 0 (zero) that makes down count faster. Anyhow I can only see a 3% difference, not 15%, but in any case it is a free trick for speed.

Everyday we learn something...

kisslorand avatar Sep 13 '23 20:09 kisslorand

What do you aim to achieve by moving `TIMER_INTF(TIMER6) &= (uint16_t)~(1<<0); to the back of the ISR? Generally that's not the right thing to do, but I'm curios to hear your explanation.

rondlh avatar Sep 14 '23 19:09 rondlh

In many microcontroller architectures, including those that use the STM32 series with the STM32 HAL library, clearing the interrupt flag at the end of the ISR is the recommended and standard practice (check TIM3, the buzzer interrupt). This is to ensure that the ISR completes its main tasks before allowing the processor to respond to new interrupts, minimizing interrupt latency and potential issues with nested interrupts. In our case in reality it is irrelevant where's the interrupt reset done in the ISR for TIM7. I guess I did it out of habit.

kisslorand avatar Sep 14 '23 19:09 kisslorand

I also think it is irrelevant in this case because the ISR will terminate long before the next interrupt arrives.

If you have experience with this kind of thing then perhaps you can have a look at the serial write DMA interrupt handler I'm testing. I still have incidental issues (typically 1 byte lost, not received by the host). It's build on top of #2824, not sure if it will work for the current source base. (STM32F2_4 platform only!). src-user-HAL-DMA Writing.zip srt-user-API-SerialConnection.zip

I have been testing this with the serial line idle interrupt disabled (manual update of wIndex in Serial_Get). That seems to work fine (still testing), but when I enable the serial line idle interrupt then I see some rare issues (1 byte lost, not received by the host). Note that I use the same ISR for the serial idle interrupt (reading) and the DMA serial writing (transfer complete interrupt). I don't use the DMA transfer complete interrupt because that is very complex, but probably it's the better choice. Any insight would be appreciate.

UPDATE: Nice to see you like it. I think the code is actually fine. I slightly improved it in a search to find an issue I am having. But the issue is not causes by the DMA transfer. I have some data corruption at very high print speeds (> 500%), sometimes a byte is missed at the host, about 1 byte in 6MB. But I see the same issue when using Interrupt based serial writing and unbuffered writing. Serial speed is at 250k Baud, now I'm testing at 115200 baud to see if that solves the issue.

rondlh avatar Sep 15 '23 07:09 rondlh

#2824 now contains a performance benchmark that you can enable in Configuration.h (DEBUG_MONITORING). When enabled, in the notification menu you will find another button to bring you to the monitoring screen. The scan rate will show you how many times the loopProcess is executed. This way you can test the impact of your improvement easily.

rondlh avatar Sep 21 '23 17:09 rondlh