Marlin icon indicating copy to clipboard operation
Marlin copied to clipboard

New encoder logic with debouncing

Open dbuezas opened this issue 1 year ago • 122 comments

Description

This PR is a follow up to #26501. After multiple tests using interrupts, it became obvious that the main problems I observe with the encoder are related to contact bouncing. Since other reported in #26501 that the old step skipping workarounds were needed in their machines to avoid unresponsiveness when changing direction, I conclude that the contact bouncing issue is present in many boards.

Luckily, update_buttons is called extremely frequently (I measure 180kHz in my SKR3) so a very simple debouncing scheme can be used without the need for interrupts.

This PR:

  • Adds debouncing logic for the encoder: encoder inputs need to be stable for 2ms before their value is used.
  • Disables clicks if too close to a scroll (100ms): These are almost always accidental
  • Removes half step timeout logic for half clicks and other workarounds: These result in the "clicky" haptic of the encoder getting "out of synch" with UI scrolls, and the symptoms it treated are now solved via debouncing.

Requirements

Any board with a rotary encoder.

Benefits

Rock solid encoder readings. No false readings and no missed steps. The encoder finally feels right.

Configurations

Related Issues

#26501 #26605

dbuezas avatar Jan 23 '24 21:01 dbuezas

Before

Very jumpy, often goes backwards.

https://github.com/MarlinFirmware/Marlin/assets/777196/8a9c6eb7-bed3-47fa-a8d6-56f3b5528d02

After

Direction and responsiveness are fixed.

https://github.com/MarlinFirmware/Marlin/assets/777196/94f9aec5-efc6-4511-ba31-32338f52b2b2

dbuezas avatar Jan 23 '24 21:01 dbuezas

Note: I avoided using interrupts or any type of counting to cover slower boards with less interrupt pins.

The changes increase RAM usage by 14 bytes as static vars:

  • 4 bytes store the last time the encoder moved (to inhibit clicks)
  • 4 bytes for the time of the last change of EN_A
  • 4 bytes same for EN_B
  • 1 byte for the last undebounced (raw) encoder pins' state
  • 1 byte for the current debounced encoder state

dbuezas avatar Jan 23 '24 23:01 dbuezas

I just put a call out in our #testing channel on Discord to collect feedback on this PR.

thisiskeithb avatar Jan 24 '24 04:01 thisiskeithb

Testing this with DWIN_MARLINUI_LANDSCAPE on a genuine DWIN (not that it should matter, an encoder is an encoder). No issues. For me the effect is more subtle, but it is more consistent for sure.

The-EG avatar Jan 25 '24 19:01 The-EG

Thanks for the test @The-EG, do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)

dbuezas avatar Jan 25 '24 20:01 dbuezas

do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)

32bit. I didn't have pronounced issues in the first place, so I think the improvement is less noticeable for me, but the biggest thing I noticed is when making a large change (ie. spinning the encoder quickly), the reaction isn't as jumpy and feels more in line with the motion. I can't say I noticed any issues with reversing, either before or after.

I'll see if I can try this on my CR10_STOCKDISPLAY later, that one had more issues than this DWIN (menu selection going too far/not far enough, etc.), even on the exact same board. I suspect the improvement may be more noticeable there.

The-EG avatar Jan 25 '24 20:01 The-EG

Sadly the same problem as with #26501 for me: The encoder loses steps when changing direction. When changing the scrolling direction, you need to go two encoder detents for the menu to actually move the first item up/down with this PR applied. Doesn't always happen, but once the bug is triggered, it seems to stay until the printer is powered down. Setup: BTT E3 Mini SKR v1.2 with CR10_STOCKDISPLAY, bugfix-2.1.x @ 22fc07d72b.

Unrelated: Am I reading the code wrong, or should BLOCK_CLICK_AFTER_MOVEMENT_MS rather be named BLOCK_MOVEMENT_AFTER_CLICK_MS? 🤔

XDA-Bam avatar Jan 26 '24 19:01 XDA-Bam

That's a bummer, but it means that it is not just a lost step. There must be something else going on.

@XDA-Bam I have a couple questions you could help me with:

  • Does the defect happen in both directions (changing from left to right and also from right to left)?
  • Excluding this problem, does the encoder work more consistently otherwise in your printer?
  • Could you upload your configuration files here?

Maybe I can reproduce this phenomenon in my machine by matching your config.

Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)

dbuezas avatar Jan 26 '24 20:01 dbuezas

That's a bummer, but it means that it is not just a lost step. There must be something else going on.

@XDA-Bam I have a couple questions you could help me with:

* Does the defect happen in both directions (changing from left to right and also from right to left)?

Yes, both.

* Excluding this problem, does the encoder work more consistently otherwise in your printer?

I think so, yes. More consistent overall.

* Could you upload your configuration files here?

Attached below. Marlin.zip

Maybe I can reproduce this phenomenon in my machine by matching your config.

Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)

OK, thanks :)

XDA-Bam avatar Jan 26 '24 20:01 XDA-Bam

@XDA-Bam could you make a video where you show the phenomenon? Please also include slowly going multiple steps in the same direction too. Thank you!

dbuezas avatar Jan 27 '24 10:01 dbuezas

Video is attached. You will see that this time, it took me quite some time (~25 s) to trigger the bug while scrolling the menus. After that, it stayed and affected all direction changes. In other occasions, the bug was triggered sooner. I'm not sure if I also encountered instances where it was triggered immediately after booting the printer, but there might have been some.

https://github.com/MarlinFirmware/Marlin/assets/1209896/f321e1fd-ed8e-452a-aed4-de63cc752f70

XDA-Bam avatar Jan 28 '24 13:01 XDA-Bam

I'll keep trying to reproduce this for some days. I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.

If I can't reproduce it this week, I'll assume this is the case and I'll re-add the reset trick to this branch, but combined with the debouncer.

dbuezas avatar Jan 29 '24 21:01 dbuezas

I'll keep trying to reproduce this for some days. I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.

It's possible. I just managed to "un-trigger" the bug again after being triggered at start-up. Just scrolled around fast for 15 seconds, and bam: Menu moves and encoder steps were aligned correctly again, no lost moves after direction changes. A bit later, the bug re-triggered. So it can fix itself without power cycling the printer.

However, I played around with different fixes assuming exactly this problem (single pulse offset) on your previous PR and could never prevent the bug from appearing. But I might have "fixed" it incorrectly, of course 😅

I remember that I zeroed in on https://github.com/MarlinFirmware/Marlin/blob/bb367670a28b24ef4383da81541916752f0ed682/Marlin/src/lcd/marlinui.cpp#L1105 because for encoders with multiple pulses per step, we're effectively rounding towards zero here if the encoder is misaligned. In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly in the following couple of lines.

XDA-Bam avatar Jan 29 '24 22:01 XDA-Bam

I committed a fix to this PR would you please give it another go @XDA-Bam ?


I just managed to "un-trigger" the bug again

All right! this confirms it then.

In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly

Exactly!

At first it works

physical dent x x x
direction -> -> -> <- <- <- <- <-
encoderDiff 0 1 2 3 0 -1 -2 -3 0
encoderPosition 0 1 0 <--- works fine

Then it slips and we land here

physical dent x x x
direction -> -> -> <- <- <- <- <-
encoderDiff -1 0 1 2 3 2 1 0 -1
encoderPosition 0 0 0 0 <--- going back and forth does nothing

The fix candidate

If the encoderDiff is still for >500ms, reset it to zero. It is a lot simpler than the previous logic, and it shouldn't have the side effect of desynchronising the physical dents with the encoder position all the time (noticeable on encoders with 2 steps per dent)

physical dent x x x x
direction RESET -> -> -> <- <- <- <- <-
encoderDiff -1 0 1 2 3 0 -1 -2 -3 0
encoderPosition 0 0 1 0 <-- fixed

dbuezas avatar Jan 30 '24 20:01 dbuezas

@dbuezas With your latest changes, the bug has changed and it subjectively appears to trigger less often. What happens now is that you can still get occasions where one step is lost after a direction change. You can then go one encoder dent left->right->left->right and so on without the menu reacting. But as soon as you move two encoder dents in either direction, the bug disappears and the menu and encoder movements are synced again. The encoder then continues to function as expected until the bug triggers again, which can then be fixed again by moving two dents in either direction.


I also tried debugging this a bit and tested the following code before your latest commit (replaces this):

  #endif // ENCODER_RATE_MULTIPLIER

  int8_t fullSteps = encoderDiff / epps;
  int8_t encoderError = encoderDiff - fullSteps*epps;
  if (encoderError != 0) {
    if (encoderDiff < 0) {
      NOMORE(fullSteps, -1);
    }
    else {
      NOLESS(fullSteps, 1);
    }
  } 
  
  if (fullSteps != 0) {
    next_encoder_enable_ms = ms + BLOCK_CLICK_AFTER_MOVEMENT_MS;
    encoderDiff = encoderError != 0 ? 0 : encoderError;
    if (can_encode() && !lcd_clicked)
      encoderPosition += (fullSteps * encoderMultiplier);
    }
  }

This behaves very similar to the current code in this PR: I still get occasions where a single encoder dent is ignored on direction changes, but the bug fixes itself if I move a couple of dents in one direction. My conclusion from this would be, that encoderDiff is actually zero in cases where a rotation by one dent doesn't result in a move in the menu, because otherwise the above code should force abs(fullSteps)>0. I find that very odd, but I'm also not sure I fully understood the overall encoder code correctly.

XDA-Bam avatar Jan 31 '24 12:01 XDA-Bam

I see. I didn't think of that.

But as soon as you move two encoder dents in either direction, the bug disappears

It should disappear as soon as the encoder is left alone for half a second.

Anyway, this explains why the old code had some extra condition in which a full step would be taken if the encoder stayed still for a bit beyond half a full step. The issue I have with the old code is that it assumes the frequency at which the function is called, so it is kind of brittle.

I'll try reading the original workaround soon, maybe it coexists fine with my debouncing logic and we can finally get all kinks ironed out :)

Thanks for your tests and feedback @XDA-Bam !

dbuezas avatar Jan 31 '24 16:01 dbuezas

It should disappear as soon as the encoder is left alone for half a second.

Forgot to test that earlier. Just checked and yes: The bug is also cleared once you leave the encoder "resting" for a second or so 👍

That also explains why it's much harder to trigger the bug after the latest commit: The half second reset will typically clear the bug before you even notice it's happened.

XDA-Bam avatar Jan 31 '24 16:01 XDA-Bam

Exactly! You can also reduce the resting time to 250ms, if that's perfect already maybe we don't need to re-add the old (more complicated) code.

dbuezas avatar Jan 31 '24 19:01 dbuezas

I tried the suggested #define RESET_ENCODER_AFTER_MS 250 and didn't feel a substantial difference. It may be quicker to clear the error, but it's closing in on a level where it becomes difficult to tell without motor-driven test setups and high-speed cams 😉 But I also saw no negative effects, so I would tend to keep the shorter reset time.

What is (still?) there is the occasional lost menu step when entering submenus. If I enter "Configuration" from the main menu, the first movement / encoder dent in that submenu is sometimes ignored. I think that was also present in the previous encoder improvement PR. Is there perhaps some deadtime hardcoded in some random location if lcd_clicked gets triggered?

XDA-Bam avatar Jan 31 '24 20:01 XDA-Bam

Ok, then we're getting somewhere:) I'll spend some time looking at the old logic and the new one and push some new code this week.

Is there perhaps some deadtime hardcoded in some random location

Yes, and it bothers me too! There's a deadtime for all inputs, which makes sense for the click and button/touch based navigation, but not for the encoder. I'm actually using a modded version of this branch that eliminates that. I'll work on that after this pr :)

dbuezas avatar Jan 31 '24 21:01 dbuezas

I tried the old reset algorithm and it makes my encoder (2 pulses per step) feel all mushy. If I move the wheel one step at normal speeds, it will often reset ecoderDiff midway, resulting in an unresponsive UI.

The old code resets every 100ms or less, so any speed rate of less than 10 pulses per second results in no UI response. 10 pulses per second are 5 steps/s in my encoder and 2.5 in yours, which explains why it makes my setup worse while yours is ok. Buuuut it also means the reset timeout should be measured per step and not per pulse!

I think a good balance is to have the reset timeout be 1/pulses_per_step seconds (250ms in your setup and 500ms in mine). Also to keep the old logic that would advance a step instead of resetting when encoderDiff = ±3

dbuezas avatar Feb 01 '24 19:02 dbuezas

All right @XDA-Bam. I re-added the old logic. The difference is that a reset may result in a full step movement if encoderDiff is past half a step (i.e ±3 pulses in your encoder). Excluding the debouncing, the only difference between this part and the original logic is that the reset happens after RESET_ENCODER_AFTER_MS = 1000ms/ENCODER_PULSES_PER_STEP (250ms in your case) instead of depending on how often MarlinUI::update() is called.

This still works fine in my encoder so I'd prefer RESET_ENCODER_AFTER_MS it not to be too much below 1000ms/ENCODER_PULSES_PER_STEP. But if this is still not acceptable in your setup, then please try with 500 and let me know.

Thanks for your tests! I hope I can stop bothering you with this very soon :). Worse case I revert that part of the PR (but keep the debouncing logic) and live forever with a custom branch for my printer.

dbuezas avatar Feb 02 '24 23:02 dbuezas

Just updated to 9e21330d7ae8b4979fee10099294e9e61c939613 and tested the latest changes. I specifically also tested slow scrolling, which isn't how I typically navigate the menus. I think I noticed the occasional lost step. I then tested #define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP), which should revert to 500 ms reset delay for my machine and found that to be superior. Therefore, I change my previous recommendation to go with 250 ms and would now advice to go with the old value wich is now equivalent to 2000/ENCODER_PULSES_PER_STEP.

I also quickly noticed occasional double stepping. Overall, my impression is that the encoder feels just as mushy and imprecise as before with these latest changes. I also tested some of my own code last week which rounded/ceiled fullSteps in case of 0<encoderDiff<epps and saw a similar effect, where I would just randomly get unexpected double steps. Overall, this makes the encoder much less predictable than ocasionally "forgetting" a step in my opinion. I would therefore vote to remove the logic of rounding up if the encoder is past half a step that was re-introduced in 92c4c7d36cf8af076500f6dc9a9a5d3c93990b82.

If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further and other testers should check how the new logic behaves on their machine.

XDA-Bam avatar Feb 03 '24 12:02 XDA-Bam

Overall, this makes the encoder much less predictable than occasionally "forgetting" a step in my opinion

I strongly agree

If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further

So specifically that's the previous commit (https://github.com/MarlinFirmware/Marlin/pull/26723/commits/c88345d99e7e15ac702a0bdc24e25d467dbd112d) but with

#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)

Right?

I'll update after your confirmation and then we should probably le the maintainers know.

dbuezas avatar Feb 03 '24 13:02 dbuezas

If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further

So specifically that's the previous commit (c88345d) but with

#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)

Right?

Yes, exactly.

XDA-Bam avatar Feb 03 '24 13:02 XDA-Bam

There's one (last?) trick we could try: only apply the threshold rule is the direction has changed

dbuezas avatar Feb 03 '24 15:02 dbuezas

I would expect the encoder to jump nonetheless, but it takes like 3 minutes to test so let's test.

XDA-Bam avatar Feb 03 '24 15:02 XDA-Bam

Ok, @XDA-Bam, I added the last test. It "advances the encoder after a while if it has crossed the half a step threshold while changing direction"

There's now two timeouts

#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)
#define ADVANCE_ENCODER_AFTER_MS (200/ENCODER_PULSES_PER_STEP)

I believe this will make your encoder rock solid (mine is unaffected and already solid).

I'll let you and the maintainers judge if the extra code is worth it :)

dbuezas avatar Feb 04 '24 20:02 dbuezas

@dbuezas Just tested 2100d45 and I'd say it is an improvement over 92c4c7d but still occasionally double steps. I certainly provoked it by intentionally changing directions quite often, but in testing we're explicitly looking for edge cases so I think that's fair 😁 . Just as before, this is also getting hard to test reliably by hand.

Considering the increased complexity for tracking lost half steps, I would still vote to keep it simple and go back to c88345d plus #define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP).

XDA-Bam avatar Feb 04 '24 21:02 XDA-Bam

Sounds good. I'll be happy to see that nasty hack go

dbuezas avatar Feb 04 '24 22:02 dbuezas