klipper icon indicating copy to clipboard operation
klipper copied to clipboard

Make it safe for klippy:ready handlers to execute UPDATE_DELAYED_GCODE

Open rschaeuble opened this issue 1 year ago • 4 comments

When another klippy:ready handler executes UPDATE_DELAYED_GCODE before DelayedGcode's ready handler runs, then self.timer_handler is None, causing the update_timer to fail.

rschaeuble avatar Jun 29 '24 14:06 rschaeuble

This actually happened to me with moonraker-timelapse, whose Moonraker plugin executes GCode once Klippy is ready.

rschaeuble avatar Jun 29 '24 14:06 rschaeuble

This looks plausible, but it makes this branch do unnecessary work to calculate a value that is discarded. You can instead make the entire branch conditional:

        elif self.timer_handler:
            waketime = self.reactor.NEVER
            if self.duration:
                waketime = self.reactor.monotonic() + self.duration
            self.reactor.update_timer(self.timer_handler, waketime)

flowerysong avatar Jul 02 '24 07:07 flowerysong

@flowerysong: you are right; making the whole branch conditional is better. I've changed the code accordingly.

rschaeuble avatar Jul 02 '24 19:07 rschaeuble

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

  1. Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
  2. Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
  3. Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

github-actions[bot] avatar Jul 17 '24 00:07 github-actions[bot]

Alright, so here's a self-review based on the steps from https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review:

  1. I have tested the change on my printer; it works as expected and solves the problem. Tests run successfully. I believe the code is ready for deployment.
  2. The change solves a real-world problem. I also found at least one other case in Discord where someone had the same issue (but didn't find the solution then).
  3. I provided a signed-off-by line in my commit. The change is entirely my own.
  4. The change follows the coding style of the modified code.
  5. No documentation update required.
  6. It's a single commit with a single, small change.

If there's more that I can do to speed this up, please let me know.

rschaeuble avatar Sep 30 '24 16:09 rschaeuble

Thanks. Could you describe the problem being solved? In what situations would a 'klippy:ready' event run a g-code command? (If there are any cases - those cases should be fixed as it is not valid to run g-code from the "klippy:ready" callback.)

-Kevin

KevinOConnor avatar Sep 30 '24 16:09 KevinOConnor

@KevinOConnor: I had this problem when using moonraker-timelapse. It's Moonraker plugin executes GCode once Klippy is ready. The executed GCode uses UPDATED_DELAYED_GCODE. When that happens before delayed_gcode has fully initialized, an exception is raised.

I don't know why, but it seems there is some amount of randomness involved. It seems to work for most people (and initially it worked for me), but for some reason sometimes the Moonraker plugin's ready-handler comes first, causing the issue.

Is there another event moonraker-timelapse's plugin could listen for? One that is emitted after all klippy_ready handlers have run?

rschaeuble avatar Sep 30 '24 17:09 rschaeuble

The "klippy:ready" callback is internal to the Klipper code (that is, only relevant to code in this github repo). If you've found some way to create an error in the code, then I'd start by providing a Klipper log of the event. Make sure you are running the pristine klipper code from this repo, reproduce the event, issue an M112 command, and then attach the full unmodified log here - or on discourse as described at https://www.klipper3d.org/Contact.html#i-found-a-bug-in-the-klipper-software .

Cheers, -Kevin

KevinOConnor avatar Sep 30 '24 17:09 KevinOConnor

@KevinOConnor: your comments and questions made me look deeper into how Moonraker and Klipper interact.

klippy.py contains the following code:

self._set_state(message_ready)
for cb in self.event_handlers.get("klippy:ready", []):
  if self.state_message is not message_ready:
    return
  cb()

The first line sets the state_message. It is this message that Moonraker uses internally to detect when Klippy is ready. Once it reports ready, Moonraker executes its internal server:klippy_ready event handlers. One of those (the one from moonraker-timelapse) then runs UPDATE_DELAYED_GCODE.

Now with a bit of bad luck, apparently Moonraker catches the ready state before Klipper's internal event handlers have all run.

Wouldn't it make sense to first run the klippy_ready handlers and only then execute self._set_state(message_ready)? This way, once Moonraker sees Klippy as ready, it is guaranteed to be ready with no potential for this race condition.

rschaeuble avatar Sep 30 '24 17:09 rschaeuble

My vague recollection is that some of the ready callbacks may inspect the current state and thus we need to transition prior to invoking the callbacks.

That said, I don't think it should have been valid for a command to run between the set state and the completion of that loop. The main loop is single threaded and none of the ready callbacks should have "yielded" to allow a command handler to run. So, something seems wrong. It'll take some work to track down which internal "klippy:ready" callback is misbehaving.

-Kevin

KevinOConnor avatar Sep 30 '24 18:09 KevinOConnor

I have to admit that I don't really understand how Klipper's reactor works, and what causes a "yield' and what doesn't.

However, I have a suspicion: I'm using the klipper_tmc_autotune plugin. This plugin has a klippy:ready handler in which it communicates with the tmc2209 chips that I have in my printer.

If I comment out this line, then the error goes away: https://github.com/andrewmcgr/klipper_tmc_autotune/blob/489a1789d5ddb5510dcdd953aaa7780892d63f0e/autotune_tmc.py#L241

Would using tmc_uart to send queries result in a "yield"?

rschaeuble avatar Sep 30 '24 19:09 rschaeuble

Klipper does not have a plugin system. Any changes to the repo (including adding of files) is a modification of the code. From what you describe, it does sound like those modifications are defective as it is not valid to issue commands like that from the "klippy:ready" handler.

-Kevin

KevinOConnor avatar Sep 30 '24 21:09 KevinOConnor

I’ll try sending a PR to autotune_tmc that fixes this. It’s a nasty issue, quite hard to diagnose, and could trigger all kinds of issues in the future.

Please allow me a few questions to make sure I got everything right:

  1. klippy:ready callbacks are not allowed to execute GCode or do anything that switches to another greenlet. The latter means all code that ends up in Reactor.pause() must not be used. Is this correct?

  2. Is the same true for klippy:connect and all the other events? I assume yes, as that sounds like a reasonable restriction.

  3. This issue should be fixable by moving the offending code from the ready-handler to a timer callback. The result would look like this:

def handle_ready(self): # the klippy:ready handler
  reactor.register_timer(self._callback, reactor.NOW)

def _callback(self):
  # do stuff that may switch to another greenlet

This will allow the loop that calls all event handlers to finish before the timer callback is called.

  1. Would a PR be welcome that extends the documentation to state these restrictions?

rschaeuble avatar Oct 02 '24 11:10 rschaeuble

Thanks.

Some answers to your questions:

  1. Yes. It shouldn't call pause() and shouldn't call anything that can raise an error ( see the short blurb at https://www.klipper3d.org/Code_Overview.html#adding-a-host-module ).
  2. No. Every event has its own execution context (as mentioned at the blurb above) and there are no general rules for them. The "klippy:connect" callbacks definitely can pause and definitely can raise config errors.
  3. Yes. Might be simpler to use a reactor.register_callback() though.
  4. Yes.

More generally, the "klippy:connect" event is a general "do what needs to be done to configure and establish communications with remote micro-controllers" and thus has a lot of flexibility. The "klippy:ready" callback is an instantaneous notification of a state transition that can't do anything complex nor raise errors, because that could leave the system in a "weird half way ready/not-ready" state.

Cheers, -Kevin

KevinOConnor avatar Oct 02 '24 14:10 KevinOConnor

Thank you for the answers, Kevin.

Here's the PR for the documentation: https://github.com/Klipper3d/klipper/pull/6702

And here's the PR for autotune_tmc, for those interested in that: https://github.com/andrewmcgr/klipper_tmc_autotune/pull/203.

rschaeuble avatar Oct 02 '24 16:10 rschaeuble