klipper heaters: drop control delay

Heaters control is reactive, the faster we apply new PWM - the faster we will see a response. Add command to directly apply PWM as fast as it arrives. As a bonus, we relaxed time constraints, Timer Too Close should happen "never". As a safety check, we still have Missed scheduling of next digital out event. This code is mostly related to this topic/comment: https://klipper.discourse.group/t/remove-the-300ms-heater-control-delay/17771/11

~~I'm not sure about the right value for reqclock. If I understood the code correctly the reqclock=0 command should be sent in the next batch.~~ - is the right one

This is more like a simple and safe solution because we just avoid timer calculation on the host. I think the proper way to do this for heaters is to make another timer function and update fields on_duration, off_duration, end_time on each command arrival.

In addition, if we want to remove the 5-second limit, cycles counter can be used there, which will be decremented on each cycle, and then there will be no dependence on timer resolution.

Thanks.

Oct 06 '24 10:10 nefelim4ag

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

Oct 20 '24 12:10 github-actions[bot]

Thanks. For what it is worth though, I'm not sure this is a good idea. In particular, I'm not sure the safety checks would work the same way they do now. (For example, consider the case where the host has frozen or delayed but has some messages queued up for transmit.)

I guess the main question I would have is - how do we measure an improvement from a change like this? What kind of tests can be run to demonstrate the real-world benefits of it?

Cheers, -Kevin

Nov 19 '24 23:11 KevinOConnor

I understand safety concerns with heaters, so there I tried to not break important one - timeout.

the case where the host has frozen or delayed but has some messages queued up for transmit.

Rereaded the code and I agree there are hypothetical situations where we have a lot of updates in the queue (up to 5/0.3 ~ 16), and if the queue emits them fast enough - timers do not reach end time, and slowly, like 1 msg in 4 seconds. The heater will never time out with my patches.

Current clock requirements will force all those messages to "decay". It will be "enough" to use send_wait_ack (to "block" heater updates).

However, the initial intention was to reduce apply latency. I think the proper solution will look like this: apply now if it has arrived early.

void command_queue_digital_out_now(uint32_t *args)
{
    uint32_t oid = args[0];
    uint32_t time = args[1];
    uint32_t on_duration = args[2];
    uint32_t time_now = timer_read_time() + timer_from_us(1000);
    if (timer_is_before(time_now, time))
        time = time_now;
    uint32_t new_args[] = {oid, time, on_duration};
    command_queue_digital_out(new_args);
}
DECL_COMMAND(command_queue_digital_out_now,
             "queue_digital_out_now oid=%c clock=%u on_ticks=%u");

I guess the main question I would have is - how do we measure an improvement from a change like this? What kind of tests can be run to demonstrate the real-world benefits of it?

There is no rush or pressure from my side, I try to fix an issue that is not even mine. TLDR: Directly answering the question, an actual real-world test, is to use too powerful heater or too low thermal mass hotend and print something like LW-PLA. If there is a thermal issue we will see a different volume of material in different places - because of temperature instability.

There are users with strange extruder heaters and as a consequence, strange behavior of PID regulation. I think PID itself is totally fine. So, only heater/thermistor/mass/loop time can be an issue.

I can do a test setup with a small thermal mass heater, and thermistor and check if it is more stable. But it will be just a spherical cow in a vacuum.

A perfect test will be to get results from a users with unstable pid/temperature and check if it goes better - like if fix is in the right direction. I only heard of strange things (I talk about 1-2 degree swings) on:

Goliath hotend
Bamboo's hotends

So, this is a target audience and I think this is who actually can give usable feedback, like magnified graph of before and after or graphs from log file (or just log file), or we can even apply some analysis to check if it is more "stable".

Such degree swings, I think will be only important for materials with temperature-dependent properties (filaflex foam or LW-PLA what I know). For everything else, this is just a graph cosmetical issue.

Thanks

Nov 20 '24 16:11 nefelim4ag

For what it is worth though, I'm not sure this is a good idea. In particular, I'm not sure the safety checks would work the same way they do now. (For example, consider the case where the host has frozen or delayed but has some messages queued up for transmit.)

Maybe you will consider the following, I can miss some safety net here: Goals: We care about the maximum duration for safety and it will be nice to reduce TTC errors and it will be nice to minimize the delay.

To get rid of scheduling in 300ms, we can slightly change semantics and send it with the end time instead of the start time. This is only supposed for heaters, where there is no time scheduling AFAIK. Like: queue_digital_out_new oid=%c end_clock=%u on_ticks=%u In that case, we still will receive an error:

if the command is delayed too much "Missed scheduling of next digital out event"
If it is arrided too late: "Scheduled digital out event will exceed max_duration"

If there are subsequent updates it will almost immediately override the current value, if there is no (heaters use caching of PWM with changes less then <5%) The current value is still valid till the end time.

In the case of queue/process has frozen the last one received wins. As a bonus, it can be applied on receive and there are fewer chances to get TTC. The only possible downside that I can imagine is mostly possible now if we spam new PWM values (with new semantics it will be equivalent to spam in the case of a stacked queue). The total on_duration can exceed max_power. *but I think with account for on_duration reminder it should be mostly mitigated https://github.com/Klipper3d/klipper/pull/6698#issuecomment-2565604455

Thanks

Jan 22 '25 02:01 nefelim4ag

Thanks for working on this.

Goals: We care about the maximum duration for safety and it will be nice to reduce TTC errors and it will be nice to minimize the delay.

I see value in reducing the round-trip delay. I also think we need to be confident the maximum duration check still remains. For what it is worth, I don't think trying to reduce "timer too close" errors is going to be productive. In my experience from looking at logs, when a timer too close error occurs, the deadlines are typically missed by over a second. In general, if the code can't make a 300ms deadline, there's no reason to think it'll make a 1000ms deadline either - and at that level of unresponsiveness, something is going to break. So, I fear trying to optimize this particular area would just move the problem "one spot over".

This is only supposed for heaters, where there is no time scheduling AFAIK

FYI, there are a couple of benefits to scheduling the heater update:

The change in PWM value is well synchronized to the temperature sensing. The next temperature change always occurs after the next sensor reading (typically 8ms from the nominal start of sensor reading). Thus, if the heater is not fully on, the temperature readings will be taken while the heater power is off.
The change in PWM value is well synchronized to previous PWM pulses. By default, the heaters use a 100ms pulse cycle time and the time between power changes is 300ms. So, the next heating update is always after the previous heating cycle completes. If this isn't the case, it can lead to systemic errors in heating rate. For example, consider the case where power is set to 50%, which would be 50ms on duration. Should a new update (eg, 51%) be received while in the middle of the previous cycle, it would be applied immediately. That is likely to bias the heater towards higher power output (should the new update arrive 40ms into the previous pulse then it'll have the effect of a 91ms pulse, and should the new update arrive 60ms into the previous cycle then there would be 101ms of on time over the past 111ms). The amount of heating would also depend on round-trip-latency which could lead to hard to debug artefacts.

Again, I agree there is a benefit to reducing the round-trip-time of heater updates. I fear there are some subtle trade-offs though.

Cheers, -Kevin

Jan 23 '25 00:01 KevinOConnor

I fear trying to optimize this particular area would just move the problem "one spot over".

I agree I just somewhat expected that folks who do not use fans at all (there are some), will have like "no" sources from the PWM side. That is minor of course.

Thus, if the heater is not fully on, the temperature readings will be taken while the heater power is off.

I just never thought of that and like that, they are really interconnected. I just used to think this is constants/values propagation, like no sense in using different values (0.3), but now it sounds much more thought out.

The change in PWM value is well synchronized to previous PWM pulses. By default, the heaters use a 100ms pulse cycle time and the time between power changes is 300ms.

Well, it is mind-blowing actually in the positive sense.

I use a higher PWM frequency on DC users (40-400Hz for fans, LED, and heaters).

Those sounds like magic internal behavior, but the last one, idea: What if we add an internal switch to "low" latency behavior if pwm_cycle_time <= 0.01 (or less)? With those, there are high chance of electrical noise already. cause the heater can be switched between ADC measurements (like most of the time if PWM >20%) because of cycle time 10ms vs measurement time 8ms. As a bonus that limits systematic error to ~1%.

In the worst case scenario, where there is 90% of power, even if a new signal arrives at 9ms of the previous cycle, it will be enabled for another 9ms, even if there are 5 of them (which is unlikely) all arrive at the end of the previous cycle it will be 45ms of the total, what is sounds acceptable and below 5% (like in update suppress) of error power on time 945ms vs 900ms expected of 1s.

I feel guilty about a large number of PRs. So, this is the last chance, feel free to say no and I can close this PR with peace of mind.

Thanks

Meh, my scheduling with end time is borked, so I dropped it.

Jan 23 '25 03:01 nefelim4ag

I feel guilty about a large number of PRs.

I appreciate your contributions and the research you've done on various areas of the code. Please don't feel guilty about opening PRs.

I agree it would be nice to improve the heater feedback rate. I'm certainly open to changes, but as you've probably guessed, I'm also leery of regressions. So, I want to make sure any changes to the heating code is discussed and possible failures considered.

Cheers, -Kevin

Jan 28 '25 02:01 KevinOConnor

Last time I asked the one who prints with LW PLA, the answer was that there is no reason to really try to handle the temperature too closely. The ability to fast heat up or cool down is more important practically.

So, as this code right now does not change anything, except the scheduled arrival time, I think it is not worth it. As there will be practically close to zero reason to try to accomplish that.

Thanks.

Dec 03 '25 23:12 nefelim4ag