Marlin
Marlin copied to clipboard
New encoder logic with debouncing
Description
This PR is a follow up to #26501. After multiple tests using interrupts, it became obvious that the main problems I observe with the encoder are related to contact bouncing. Since other reported in #26501 that the old step skipping workarounds were needed in their machines to avoid unresponsiveness when changing direction, I conclude that the contact bouncing issue is present in many boards.
Luckily, update_buttons is called extremely frequently (I measure 180kHz in my SKR3) so a very simple debouncing scheme can be used without the need for interrupts.
This PR:
- Adds debouncing logic for the encoder: encoder inputs need to be stable for 2ms before their value is used.
- Disables clicks if too close to a scroll (100ms): These are almost always accidental
- Removes half step timeout logic for half clicks and other workarounds: These result in the "clicky" haptic of the encoder getting "out of synch" with UI scrolls, and the symptoms it treated are now solved via debouncing.
Requirements
Any board with a rotary encoder.
Benefits
Rock solid encoder readings. No false readings and no missed steps. The encoder finally feels right.
Configurations
Related Issues
#26501 #26605
Before
Very jumpy, often goes backwards.
https://github.com/MarlinFirmware/Marlin/assets/777196/8a9c6eb7-bed3-47fa-a8d6-56f3b5528d02
After
Direction and responsiveness are fixed.
https://github.com/MarlinFirmware/Marlin/assets/777196/94f9aec5-efc6-4511-ba31-32338f52b2b2
Note: I avoided using interrupts or any type of counting to cover slower boards with less interrupt pins.
The changes increase RAM usage by 14 bytes as static vars:
- 4 bytes store the last time the encoder moved (to inhibit clicks)
- 4 bytes for the time of the last change of EN_A
- 4 bytes same for EN_B
- 1 byte for the last undebounced (raw) encoder pins' state
- 1 byte for the current debounced encoder state
I just put a call out in our #testing channel on Discord to collect feedback on this PR.
Testing this with DWIN_MARLINUI_LANDSCAPE on a genuine DWIN (not that it should matter, an encoder is an encoder). No issues. For me the effect is more subtle, but it is more consistent for sure.
Thanks for the test @The-EG, do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)
do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)
32bit. I didn't have pronounced issues in the first place, so I think the improvement is less noticeable for me, but the biggest thing I noticed is when making a large change (ie. spinning the encoder quickly), the reaction isn't as jumpy and feels more in line with the motion. I can't say I noticed any issues with reversing, either before or after.
I'll see if I can try this on my CR10_STOCKDISPLAY later, that one had more issues than this DWIN (menu selection going too far/not far enough, etc.), even on the exact same board. I suspect the improvement may be more noticeable there.
Sadly the same problem as with #26501 for me: The encoder loses steps when changing direction. When changing the scrolling direction, you need to go two encoder detents for the menu to actually move the first item up/down with this PR applied. Doesn't always happen, but once the bug is triggered, it seems to stay until the printer is powered down. Setup: BTT E3 Mini SKR v1.2 with CR10_STOCKDISPLAY, bugfix-2.1.x @ 22fc07d72b.
Unrelated: Am I reading the code wrong, or should BLOCK_CLICK_AFTER_MOVEMENT_MS rather be named BLOCK_MOVEMENT_AFTER_CLICK_MS? 🤔
That's a bummer, but it means that it is not just a lost step. There must be something else going on.
@XDA-Bam I have a couple questions you could help me with:
- Does the defect happen in both directions (changing from left to right and also from right to left)?
- Excluding this problem, does the encoder work more consistently otherwise in your printer?
- Could you upload your configuration files here?
Maybe I can reproduce this phenomenon in my machine by matching your config.
Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)
That's a bummer, but it means that it is not just a lost step. There must be something else going on.
@XDA-Bam I have a couple questions you could help me with:
* Does the defect happen in both directions (changing from left to right and also from right to left)?
Yes, both.
* Excluding this problem, does the encoder work more consistently otherwise in your printer?
I think so, yes. More consistent overall.
* Could you upload your configuration files here?
Attached below. Marlin.zip
Maybe I can reproduce this phenomenon in my machine by matching your config.
Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)
OK, thanks :)
@XDA-Bam could you make a video where you show the phenomenon? Please also include slowly going multiple steps in the same direction too. Thank you!
Video is attached. You will see that this time, it took me quite some time (~25 s) to trigger the bug while scrolling the menus. After that, it stayed and affected all direction changes. In other occasions, the bug was triggered sooner. I'm not sure if I also encountered instances where it was triggered immediately after booting the printer, but there might have been some.
https://github.com/MarlinFirmware/Marlin/assets/1209896/f321e1fd-ed8e-452a-aed4-de63cc752f70
I'll keep trying to reproduce this for some days. I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.
If I can't reproduce it this week, I'll assume this is the case and I'll re-add the reset trick to this branch, but combined with the debouncer.
I'll keep trying to reproduce this for some days. I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.
It's possible. I just managed to "un-trigger" the bug again after being triggered at start-up. Just scrolled around fast for 15 seconds, and bam: Menu moves and encoder steps were aligned correctly again, no lost moves after direction changes. A bit later, the bug re-triggered. So it can fix itself without power cycling the printer.
However, I played around with different fixes assuming exactly this problem (single pulse offset) on your previous PR and could never prevent the bug from appearing. But I might have "fixed" it incorrectly, of course 😅
I remember that I zeroed in on
https://github.com/MarlinFirmware/Marlin/blob/bb367670a28b24ef4383da81541916752f0ed682/Marlin/src/lcd/marlinui.cpp#L1105
because for encoders with multiple pulses per step, we're effectively rounding towards zero here if the encoder is misaligned. In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly in the following couple of lines.
I committed a fix to this PR would you please give it another go @XDA-Bam ?
I just managed to "un-trigger" the bug again
All right! this confirms it then.
In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly
Exactly!
At first it works
| physical dent | x | x | x | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| direction | -> | -> | -> | <- | <- | <- | <- | <- | ||
| encoderDiff | 0 | 1 | 2 | 3 | 0 | -1 | -2 | -3 | 0 | |
| encoderPosition | 0 | 1 | 0 | <--- works fine |
Then it slips and we land here
| physical dent | x | x | x | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| direction | -> | -> | -> | <- | <- | <- | <- | <- | ||
| encoderDiff | -1 | 0 | 1 | 2 | 3 | 2 | 1 | 0 | -1 | |
| encoderPosition | 0 | 0 | 0 | 0 | <--- going back and forth does nothing |
The fix candidate
If the encoderDiff is still for >500ms, reset it to zero. It is a lot simpler than the previous logic, and it shouldn't have the side effect of desynchronising the physical dents with the encoder position all the time (noticeable on encoders with 2 steps per dent)
| physical dent | x | x | x | x | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| direction | RESET | -> | -> | -> | <- | <- | <- | <- | <- | ||
| encoderDiff | -1 | 0 | 1 | 2 | 3 | 0 | -1 | -2 | -3 | 0 | |
| encoderPosition | 0 | 0 | 1 | 0 | <-- fixed |
@dbuezas With your latest changes, the bug has changed and it subjectively appears to trigger less often. What happens now is that you can still get occasions where one step is lost after a direction change. You can then go one encoder dent left->right->left->right and so on without the menu reacting. But as soon as you move two encoder dents in either direction, the bug disappears and the menu and encoder movements are synced again. The encoder then continues to function as expected until the bug triggers again, which can then be fixed again by moving two dents in either direction.
I also tried debugging this a bit and tested the following code before your latest commit (replaces this):
#endif // ENCODER_RATE_MULTIPLIER
int8_t fullSteps = encoderDiff / epps;
int8_t encoderError = encoderDiff - fullSteps*epps;
if (encoderError != 0) {
if (encoderDiff < 0) {
NOMORE(fullSteps, -1);
}
else {
NOLESS(fullSteps, 1);
}
}
if (fullSteps != 0) {
next_encoder_enable_ms = ms + BLOCK_CLICK_AFTER_MOVEMENT_MS;
encoderDiff = encoderError != 0 ? 0 : encoderError;
if (can_encode() && !lcd_clicked)
encoderPosition += (fullSteps * encoderMultiplier);
}
}
This behaves very similar to the current code in this PR: I still get occasions where a single encoder dent is ignored on direction changes, but the bug fixes itself if I move a couple of dents in one direction. My conclusion from this would be, that encoderDiff is actually zero in cases where a rotation by one dent doesn't result in a move in the menu, because otherwise the above code should force abs(fullSteps)>0. I find that very odd, but I'm also not sure I fully understood the overall encoder code correctly.
I see. I didn't think of that.
But as soon as you move two encoder dents in either direction, the bug disappears
It should disappear as soon as the encoder is left alone for half a second.
Anyway, this explains why the old code had some extra condition in which a full step would be taken if the encoder stayed still for a bit beyond half a full step. The issue I have with the old code is that it assumes the frequency at which the function is called, so it is kind of brittle.
I'll try reading the original workaround soon, maybe it coexists fine with my debouncing logic and we can finally get all kinks ironed out :)
Thanks for your tests and feedback @XDA-Bam !
It should disappear as soon as the encoder is left alone for half a second.
Forgot to test that earlier. Just checked and yes: The bug is also cleared once you leave the encoder "resting" for a second or so 👍
That also explains why it's much harder to trigger the bug after the latest commit: The half second reset will typically clear the bug before you even notice it's happened.
Exactly! You can also reduce the resting time to 250ms, if that's perfect already maybe we don't need to re-add the old (more complicated) code.
I tried the suggested #define RESET_ENCODER_AFTER_MS 250 and didn't feel a substantial difference. It may be quicker to clear the error, but it's closing in on a level where it becomes difficult to tell without motor-driven test setups and high-speed cams 😉 But I also saw no negative effects, so I would tend to keep the shorter reset time.
What is (still?) there is the occasional lost menu step when entering submenus. If I enter "Configuration" from the main menu, the first movement / encoder dent in that submenu is sometimes ignored. I think that was also present in the previous encoder improvement PR. Is there perhaps some deadtime hardcoded in some random location if lcd_clicked gets triggered?
Ok, then we're getting somewhere:) I'll spend some time looking at the old logic and the new one and push some new code this week.
Is there perhaps some deadtime hardcoded in some random location
Yes, and it bothers me too! There's a deadtime for all inputs, which makes sense for the click and button/touch based navigation, but not for the encoder. I'm actually using a modded version of this branch that eliminates that. I'll work on that after this pr :)
I tried the old reset algorithm and it makes my encoder (2 pulses per step) feel all mushy. If I move the wheel one step at normal speeds, it will often reset ecoderDiff midway, resulting in an unresponsive UI.
The old code resets every 100ms or less, so any speed rate of less than 10 pulses per second results in no UI response. 10 pulses per second are 5 steps/s in my encoder and 2.5 in yours, which explains why it makes my setup worse while yours is ok. Buuuut it also means the reset timeout should be measured per step and not per pulse!
I think a good balance is to have the reset timeout be 1/pulses_per_step seconds (250ms in your setup and 500ms in mine). Also to keep the old logic that would advance a step instead of resetting when encoderDiff = ±3
All right @XDA-Bam. I re-added the old logic. The difference is that a reset may result in a full step movement if encoderDiff is past half a step (i.e ±3 pulses in your encoder).
Excluding the debouncing, the only difference between this part and the original logic is that the reset happens after RESET_ENCODER_AFTER_MS = 1000ms/ENCODER_PULSES_PER_STEP (250ms in your case) instead of depending on how often MarlinUI::update() is called.
This still works fine in my encoder so I'd prefer RESET_ENCODER_AFTER_MS it not to be too much below 1000ms/ENCODER_PULSES_PER_STEP. But if this is still not acceptable in your setup, then please try with 500 and let me know.
Thanks for your tests! I hope I can stop bothering you with this very soon :). Worse case I revert that part of the PR (but keep the debouncing logic) and live forever with a custom branch for my printer.
Just updated to 9e21330d7ae8b4979fee10099294e9e61c939613 and tested the latest changes. I specifically also tested slow scrolling, which isn't how I typically navigate the menus. I think I noticed the occasional lost step. I then tested #define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP), which should revert to 500 ms reset delay for my machine and found that to be superior. Therefore, I change my previous recommendation to go with 250 ms and would now advice to go with the old value wich is now equivalent to 2000/ENCODER_PULSES_PER_STEP.
I also quickly noticed occasional double stepping. Overall, my impression is that the encoder feels just as mushy and imprecise as before with these latest changes. I also tested some of my own code last week which rounded/ceiled fullSteps in case of 0<encoderDiff<epps and saw a similar effect, where I would just randomly get unexpected double steps. Overall, this makes the encoder much less predictable than ocasionally "forgetting" a step in my opinion. I would therefore vote to remove the logic of rounding up if the encoder is past half a step that was re-introduced in 92c4c7d36cf8af076500f6dc9a9a5d3c93990b82.
If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further and other testers should check how the new logic behaves on their machine.
Overall, this makes the encoder much less predictable than occasionally "forgetting" a step in my opinion
I strongly agree
If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further
So specifically that's the previous commit (https://github.com/MarlinFirmware/Marlin/pull/26723/commits/c88345d99e7e15ac702a0bdc24e25d467dbd112d) but with
#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)
Right?
I'll update after your confirmation and then we should probably le the maintainers know.
If these suggested changes are implemented, I would say we'll reach a state where CR10_STOCKDISPLAY can't be improved much further
So specifically that's the previous commit (c88345d) but with
#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)Right?
Yes, exactly.
There's one (last?) trick we could try: only apply the threshold rule is the direction has changed
I would expect the encoder to jump nonetheless, but it takes like 3 minutes to test so let's test.
Ok, @XDA-Bam, I added the last test. It "advances the encoder after a while if it has crossed the half a step threshold while changing direction"
There's now two timeouts
#define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP)
#define ADVANCE_ENCODER_AFTER_MS (200/ENCODER_PULSES_PER_STEP)
I believe this will make your encoder rock solid (mine is unaffected and already solid).
I'll let you and the maintainers judge if the extra code is worth it :)
@dbuezas Just tested 2100d45 and I'd say it is an improvement over 92c4c7d but still occasionally double steps. I certainly provoked it by intentionally changing directions quite often, but in testing we're explicitly looking for edge cases so I think that's fair 😁 . Just as before, this is also getting hard to test reliably by hand.
Considering the increased complexity for tracking lost half steps, I would still vote to keep it simple and go back to c88345d plus #define RESET_ENCODER_AFTER_MS (2000/ENCODER_PULSES_PER_STEP).
Sounds good. I'll be happy to see that nasty hack go