Marlin
Marlin copied to clipboard
Temperature variance monitor tweaks
Description
It appears that some heaters tend to keep their temperature constant for prolonged periods of time, thus triggering the thermal malfunction error and halting. So some tweaking is necessary to filter out such false positives.
This tweak allows defining a larger detection window for thermal malfunction, without affecting the other thermal runaway checks. This poses the risk that the time window override might be too large for some faster heaters, allowing them to reach higher temperatures in the case of real temperature malfunction.
Also, added some detailed information in the comments about the feature and how to tweak it.
Requirements
THERMAL_PROTECTION_VARIANCE_MONITOR
Benefits
Allows finer tweaking in the case of very stable heaters.
Configurations
Related Issues
#20749, #23373
This is the most decent solution I could think of. Other tweaks I considered:
- define separate time windows for each heater (practically not an important improvement)
- detect variance globally - if any heater updates do not halt, but this does not take into account different polling methods (thermistors are updated independently from thermocouples)
I think the issue at hand is rare (such extreme temperature stability), but still the firmware should be as reliable as possible, so it is important that the variance feature remains enabled, even with some compromise. There's room for discussion here, so I'm leaving this PR as a draft for the moment.
Some info that may be helpful: one of my printers triggers this pretty consistently on the hotend heater.
According to OctoPrint, the hotend temperature only varied from target a few times over 30+ minutes, and then only by 0.2 C at most, granted OctoPrint is only getting the temperature periodically so the actual temp probably does fluctuate more often, but it is very stable:

This is an Ender 3 Pro with a MicroSwiss hotend using stock thermistor and heater, so not some rare/odd setup.
I'll try to flash these changes and see if it improves, but current bugfix renders my printer unusable with this feature turned on unfortunately.
Thank you @The-EG, this is valuable information. I guess this is with the default thermal protection periods, so do give it a try if you will, with the period override (120'' or maybe more) and see how it works. You don't have to print anything, just keep the heater on.
If you still get errors, you should keep the feature off, and we might have to think of a different solution.
Yes, sorry I didn't specify, that was with the default periods. I pulled this PR down and set the period to 120 seconds, reflashed, heated up the hotend and waited. it failed after 7 minutes. Here's the temps that got reported back to OctoPrint during that time, starting just before the temp stabilized:
Log Output
Recv: T:219.17 /230.00 B:19.37 /0.00 @:127 B@:0
Recv: T:222.14 /230.00 B:19.37 /0.00 @:73 B@:0
Recv: T:224.29 /230.00 B:19.37 /0.00 @:51 B@:0
[...]
Recv: T:226.00 /230.00 B:19.41 /0.00 @:40 B@:0
Recv: T:227.00 /230.00 B:19.37 /0.00 @:40 B@:0
[...]
Recv: T:227.00 /230.00 B:19.45 /0.00 @:58 B@:0
Recv: T:227.00 /230.00 B:19.41 /0.00 @:69 B@:0
Recv: T:227.00 /230.00 B:19.49 /0.00 @:78 B@:0
[...]
Recv: T:227.00 /230.00 B:19.41 /0.00 @:84 B@:0
Recv: T:227.12 /230.00 B:19.45 /0.00 @:87 B@:0
[...]
Recv: T:228.00 /230.00 B:19.41 /0.00 @:75 B@:0
Recv: T:229.00 /230.00 B:19.53 /0.00 @:62 B@:0
Recv: T:229.75 /230.00 B:19.41 /0.00 @:52 B@:0
[...]
Recv: T:230.00 /230.00 B:19.49 /0.00 @:55 B@:0
Recv: T:230.00 /230.00 B:19.53 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.57 /0.00 @:62 B@:0
Recv: T:230.00 /230.00 B:19.53 /0.00 @:63 B@:0
Recv: T:230.00 /230.00 B:19.53 /0.00 @:64 B@:0
[...]
Recv: T:230.00 /230.00 B:19.69 /0.00 @:65 B@:0
Recv: T:230.06 /230.00 B:19.37 /0.00 @:63 B@:0
[...]
Recv: T:230.62 /230.00 B:19.65 /0.00 @:52 B@:0
Recv: T:230.94 /230.00 B:19.57 /0.00 @:48 B@:0
Recv: T:230.37 /230.00 B:19.65 /0.00 @:61 B@:0
[...]
Recv: T:230.00 /230.00 B:19.69 /0.00 @:66 B@:0
Recv: T:230.00 /230.00 B:19.80 /0.00 @:64 B@:0
Recv: T:230.00 /230.00 B:19.73 /0.00 @:63 B@:0
[...]
Recv: T:230.00 /230.00 B:19.77 /0.00 @:62 B@:0
Recv: T:230.00 /230.00 B:19.61 /0.00 @:62 B@:0
[...]
Recv: T:230.12 /230.00 B:19.96 /0.00 @:60 B@:0
Recv: T:230.87 /230.00 B:19.88 /0.00 @:44 B@:0
Recv: T:230.25 /230.00 B:19.84 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:19.80 /0.00 @:63 B@:0
Recv: T:230.00 /230.00 B:19.77 /0.00 @:61 B@:0
[...]
Recv: T:230.00 /230.00 B:19.80 /0.00 @:61 B@:0
Recv: T:230.00 /230.00 B:19.84 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:19.77 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.80 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.88 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:19.96 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:19.92 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.96 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:19.92 /0.00 @:60 B@:0
[...]
Recv: T:230.06 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.19 /230.00 B:20.00 /0.00 @:56 B@:0
[...]
Recv: T:230.06 /230.00 B:19.96 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:19.96 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.96 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:19.96 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:229.94 /230.00 B:20.00 /0.00 @:61 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:229.94 /230.00 B:20.00 /0.00 @:61 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
[...]
Recv: T:230.12 /230.00 B:20.00 /0.00 @:57 B@:0
Recv: T:230.06 /230.00 B:20.04 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.00 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.12 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:60 B@:0
[...]
Recv: T:230.00 /230.00 B:20.12 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.16 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.08 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.12 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.04 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.16 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.16 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.20 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.20 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.08 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.23 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.31 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.20 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.27 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.43 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.20 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.27 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.39 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.47 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.43 /0.00 @:59 B@:0
Recv: T:229.94 /230.00 B:20.39 /0.00 @:60 B@:0
Recv: T:230.00 /230.00 B:20.47 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.35 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.39 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.43 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.47 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.39 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.47 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.55 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.47 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.51 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.55 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.55 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.59 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.66 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.66 /0.00 @:59 B@:0
[...]
Recv: T:230.00 /230.00 B:20.70 /0.00 @:59 B@:0
Recv: T:230.00 /230.00 B:20.62 /0.00 @:59 B@:0
I'll turn this off for now, but if there's something more I can test later, let me know.
edit: if it matters this is with a creality v4.2.7 board, so STM32F1
Yeah, you better keep it off.
This makes me rethink the usefulness of variance monitoring as a general solution to the temperatures updating issue...
@zeleps Looks like you are finding more hardware that is falling over. p.s I disabled the other PR and the printer completed a near 10hr print just fine. Rome was`nt built in 1 day so keep going.
@CBDesignS unfortunately this is something you have to anticipate when writing software that may run on an assortment of hardware platforms. These features need a lot of comprehensive testing, and this is not an option for this project. I have to admit that you've been - involuntarily - part of the testing squad, since there is no other way.
My recommendation for the feature right now is to keep it disabled by default, with a suppressible compilation warning and detailed explanation of its function, configuration and expected behavior. I'll create a new commit in this direction probably tomorrow.
@thisiskeithb, could you please add a "needs discussion" label to this PR?
My recommendation for the feature right now is to keep it disabled by default,
I think that should be done now since it's currently enabled & will likely cause issues on any printer with stable temps.
My recommendation for the feature right now is to keep it disabled by default,
I think that should be done now since it's currently enabled & will likely cause issues on any printer with stable temps.
I'm not home right now. Also, it should probably be done on 2.0.9.3 as well, but I don't think I can create a PR for that, can I?
Also, it should probably be done on 2.0.9.3 as well, but I don't think I can create a PR for that, can I?
thinkyhead has magical powers.
By all means then, please go ahead and disable it, I'll make my proposed changes in comments and warnings tomorrow.
@zeleps if it was`nt for you and like minded Devs I would still be running Sailfish on a prehistoric 8 bit board. I/we are just glad to help by testing on lots of different hardware so little bugs etc can be squashed and keep Marlin going strong. Respect is offered where Respect is due.
Disabled the feature by default, updated comments, added a suppressible warning.
Also thinking of a mechanism complimentary to watchdog (in the main loop) that would ensure temperature ISR is reading sensor values regularly. We don't really know if this is the source of the original problem, but it will help pinpoint the issue if it happens again. I'll propose something more concrete soon.
Also ran into this with the Ender 3 S1, default hardware configuration, properly PID tuned.
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: SD printing byte 194608/12029131
Recv: ok P0 B0
Send: M27
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.01 /60.00 @:55 B@:22
Recv: SD printing byte 194736/12029131
Recv: ok P0 B0
Send: M27
Recv: T:215.00 /215.00 B:59.99 /60.00 @:55 B@:30
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: T:215.00 /215.00 B:60.00 /60.00 @:55 B@:26
Recv: Error:Thermal Malfunction, system stopped! Heater_ID: E0
Changing monitoring state from "Printing from SD" to "Cancelling"
Send: M108
Recv:
Recv: T:215.00 /0.00 B:59.99 /0.00 @:0 B@:0
Recv: echo:busy: processing
Recv: T:215.00 /0.00 B:59.99 /0.00 @:0 B@:0
Recv: T:215.00 /0.00 B:59.99 /0.00 @:0 B@:0
Recv: start
Recv: Watchdog Reset
Recv: Marlin bugfix-2.0.x
Recv: echo: Last Updated: 2022-02-14 | Author: (none, default config)
Recv: echo: Compiled: Feb 17 2022
Recv: echo: Free Memory: 36319 PlannerBufferBytes: 1856
Recv:
May I suggest we do enable this by default for external temperature monitors like the MAX31865, which are more likely to have the required variances and have more likelihood of failing compared to the ADC inputs of the micro controller?
@Sebazzz, the problem we're trying to detect is related to the ADC input reading (which is done in the temperatures ISR), not external sensors. All reports in #20749 are related to onboard thermistors.
Ive got many machines in chambers which hold very stable temps. This may work better if comparing to the raw adc readings with temp_hotend[e].raw and compare for any meaningful change, even if its below the point that conversion to C will show or compare the current translated temperature to the direct reading and throw error if they deviate. That would avoid issues on very stable systems.
Another thing to note, the hang issues you mentioned I have seen almost exclusively on overflashed BTT boards. They instruct users to flash 256k rated MCU's to 512k and issues like this occur. Its likely memory corruption as rejected memory is being used. Has there been an absolute confirmation of hangs with heaters on with a board flashed withing MCU rated limits?
I considered that, but celcius_float_t can definitely and accurately represent a +-1 change in a 12bit ADC sample (even 15bit for that matter). Since we can only speculate about the source of the problem, the original idea was to try detecting the issue at a high level first, to cover as many code paths as possible, hence the temperature variance detection. If we wanted to verify that the ISR is working properly, we could just create a simple counter and make sure it's increasing constantly. This is simple and efficient, but covers only the ISR freezing scenario (which has no logical explanation so far).
My MCU has 512K flash and it was certainly not overflashed when the issue occured. But it happened only a couple of times a long time ago, never seen it again. Are you sure that the cases were related to overflashing? Reports about #20749 are sparse and affected users have not managed to reproduce the issue consistently in order to gather more info. And this is the first time I'm hearing about overflashing related to the issue.
TBH I feel safer with the variance detection enabled on my system, but I understand that it tackles a rare problem (which might have been eliminated already) and that it does crash some well-tuned systems (although we are not really certain that all the cases that have occurred are false positives, are we?). Imho it's better to keep it there as an additional measure, disabled by default, so if anyone wants the extra safety or encounters the stuck temp readings issue, they can enable it.
I have a spare S71214C plc with an RTD module laying around. The slice hotends can fit a second thermistor in the block, I could set it to data log and record the variance. Probably not going to get much more accurate than that... But itll be a bit before I can dig down to it on my project list here, still digging out from the holiday automotive shutdown chaos.
This fixes the watchdog not resetting the printer when issue #20749 occurs on LPC176x MCUs. Many thanks to @MakerMeik for his patience and his efforts towards debugging this.
We're still looking to find the root cause and to prevent the ADC from stopping. Since this PR effectively detects the thermal malfunction, I'll open a new PR if and when we have new developments on the ADC issue.
I suggest retaining the thermal malfunction feature (disabled by default), since it's a high level, platform independent detection mechanism and it might come handy in the future.
More info on the issue can be found in the discussion of #23373.
We're still looking to find the root cause and to prevent the ADC from stopping.
Which platforms are affected? ADC should persist as long as the register bits are not corrupted, and the Temperature ISR should continue as long as interrupts are enabled.
Which platforms are affected? ADC should persist as long as the register bits are not corrupted, and the Temperature ISR should continue as long as interrupts are enabled.
So far, we know about LPC176x, no other testimonies. The ISR works fine, as does the watchdog. What happens is that the ADC values are not refreshed. The doneness bit is off, but HAL implementation ignores this and continues to read stale values (see this). My last commit stops stale values from being used, so the watchdog can kick in.
I've given @makermeik (who frequently has the issue) a build that dumps related ADC register block values, as well as PCLKSEL0 and PCONP, to see if and what is affected when the issue occurs. We're waiting for the issue to occur and we'll have a complete picture (hopefully).
It's been a while. I haven't heard back from @MakerMeik since.
Last thing I know about this is that my latest commit (1ffe8b4) properly triggers the watchdog, thus preventing a thermal runaway (this was tested thoroughly).
I'm not using this board anymore, so I don't personally care whether this PR is merged or not, but I think the proposed way of handling ADC reads is definitely better than ignoring the doneness bit.
I guess it's time to close this, one way or another.
Believe it or not, the problem has not reoccurred since our last contact. And in the meantime I have made several dozen prints with the machine. According to my understanding, the print should at least stop when the problem occurs. But that was not (any longer) the case. Who knows whether the cause was a loose contact that just happened to get better again. I also assume that your last commit basically works. So I also think you should close the issue. If something changes, I would contact you anyway ;-) Anyway, thanks a lot for your great support @zeleps and the great work of you and the whole team!
Really glad to know that @MakerMeik! I don't know if reading and respecting the doneness bit actively solves the issue, but that's the code that Meik's been using.
One interesting development in the area of thermal protection is Prusa Firmware updating its behavior so that if the temperature goes too far outside the hysteresis range for too long, instead of killing the machine due to thermal runaway, it sounds an alarm and pauses the machine, allowing the user to resume if the temperature has simply been thrown off by something. That would be a good addition to Marlin as well.