klipper PR: Load Cell Endstop

The goal of this PR is to get from a gram scale all the way to being able to home a printer with a load cell.

An overview of whats in here:

The SOS Filter is created as a new utility component on the MCU.
The LoadCellEndstop is introduced - this allows load cell sensors to acts as data sources for an endstop that lives on the MCU. It can detect when a sudden force on the load cell indicates a collision. The SOS filter is used to do this.
The Load Cell sensors are modified so they can send data to a LoadCellEndstop
Finally the LoadCellProbe is introduced that can combine a load cell sensor the LoadCellEndstop together to turn it into a probe. The probe can home the printers z axis and act as a probe.

SOS Filter

This is a general purpose fixed point SOS filter implementation. It supports a maximum of 4 filter sections.

The primary purpose it to reject drift and noise while homing. There is a post about this here. This idea was originally introduced by Prusa Research code here. Their implementation uses an old code generator, mkfilter, and can't be easily changed without re-compilation.

This implementation uses Second Order Sections and uploads each section to the MCU to initialize the filter. This has a number of advantages for klipper:

This implementation uses fixed point math to get around limitations in klipper's communications protocol and potential issues using the FPU's on chip.
The SOS algorithm avoids using a costly division instruction in the filter loop, saving hundreds of clock cycles.
The only requirement for fast execution is single cycle SMULL which is a feature of every ARM Cortex M0+ core.

LoadCellEndstop

Most of the code in this PR is about LoadCellEndstop. This is not the end of the code for the load_cell_probe.py. There are another ~800 lines of code that will go into this file to do the post homing Tap Analysis. I see value in keeping these chunks of functionality separate and combining them with composition. People have reached out to me to do a Filament Scale that would have specific functionality for weighing filament, calculating filament used in a print and so on. Keeping load_cell relatively small and simple will let that development happen independently.

There is some copy/paste stuff in this code that I think is worth talking about. LoadCellEndstop combines the ProbeEndstopWrapper and MCU_endstop types. I want to re-use the code from ProbeEndstopWrapper in probe.py. My suggested refactor of ProbeEndstopWrapper is to add an optional constructor parameter that injects in the mcu_endstop instance and overrides its default behavior of creating one from a pin. Most of what ProbeEndstopWrapper does is really about multi-probe and it seems a real shame to copy/paste that. If that refactor sounds cool, I'll make that change.

Mar 25 '25 18:03 garethky

Re this diff: https://github.com/Klipper3d/klipper/compare/9756dcf8e4f897698b74ba9f93c29075923af701..09dfbd9ed05e56bd00b2f788d585ecff3095bbfe

There has been a very difficult to replicate bug where I got the "Timer Too Close" error. I couldn't get anyone else to report a replication. I think this change solves it.

The contract for toolhead.get_last_move_time() isn't well documented. If the last move has ended it actually returns a time in the future when you might expect to be able to begin executing the next move. That could be upwards of 350ms depending on the state inside toolhead. The 100ms padding I was adding was OK 99.99% of the time but very occasionally it would not be enough.

At least I think that's what the root cause is. Its so hard to replicate this organically. It might be a 1 in 1000 event.

Mar 27 '25 19:03 garethky

I had the safety range error happen to me today and it wasn't a good user experience. Specifically the error message doesn't say why the trsync was triggered with an error. There are 3 error conditions that could have triggered it:

The safety range could have been violated (this was my issue)
The watchdog timer could be violated
There could have been a fixed point math overflow

So I'm going to do a patch that increments the error code by 1 for each of these cases and emits a specific error message on the Host side.

Apr 02 '25 19:04 garethky

Thanks for working on this. I agree it would be good to merge this support into Klipper. I do have some comments on the changes.

I like how you've split out the SOSFilter into its own standalone module. I also like how you've split up this PR into separate commits. I'm going to review each commit separately. These responses will follow below.

Cheers, -Kevin

Apr 04 '25 00:04 KevinOConnor

I'm game for all those changes. I had thought about all of them (at one time or another) but didn't want to dump the effort into something that configurable until you had generally signed off on the idea.

I can make the changes as refactor commits and then squash them after we get to a good end state

Apr 04 '25 01:04 garethky

That completes my code review. Thanks again for working on this.

If you do rebase the commits, it would help a little if you could have each commit follow the "module: description" format on the first line. Just choose a single filename (without extension) for the module name - it's just to give context to the rest of the commit and doesn't have to be all inclusive.

Lastly, can you provide a little context for what is coming immediately after this PR? Will it include the "elbow detection", and is that work already completed or still in progress? Will there be changes to the existing mcu code, changes to existing config parameters, changes to the API interface, or similar?

-Kevin

Apr 04 '25 01:04 KevinOConnor

FYI, here is a mockup of a further generalized sos_filter.c (untested, but hopefully gives the idea): https://github.com/KevinOConnor/klipper-dev/commit/2e174e35550aaf996d907ca6ad4cdb20fed61c1a

-Kevin

EDIT: Actually, the python code should be using is_init=True - updated the commit above.

Apr 04 '25 04:04 KevinOConnor

That completes my code review. Thanks again for working on this.

Thank you for the detailed code review! I know its a lot of code.

I've read the comments and I see a couple of things we should discuss first.

It does mean you wont be able to use the sensor for homing though - not sure if that is an immediate problem for testing. As part of the "homing discussion" on Discourse, we definitely do want to rework the homing code so that homing can generally be done using just the probing interface.

This is a show stopper. Its not just my test machine but many other people that are testing this code and using it to develop hardware and software.

If I have to give up endstop compatibility I think the only path forward now is to wait until after you re-write homing to support this. Ultimately this is an endstop/homing sensor and there are plenty of machines without any other sensor for z. e.g. the Prusa MK4

The home_start() code is called with a specific time that is related to a schedule that the caller has synchronized with the steppers and other parts of the system. The callback may not choose another time nor delay the time - it may only start the sensor at the specified time.

The code works because non of that is checked or enforced (maybe some kinematics break this assumption?). This code starts the trsync with the correct time. But if that's the contract you want then that's fine. We just need an alternative to do the same thing.

Taring before probe is a hard requirement. Without the tare the sensor triggers instantly far too often to be useful. This same technique is in Prusa's codebase and the Magneto X folks noted a similar fix for their issues. The machine has to be mechanically quiet (physically still) before the call, it cant be issued on the host while the system is in motion. I.e. it cant be run ahead of time.

Again if you cant allow this then I think we have to wait for the homing refactor

Maybe a way to make progress now is to focus on the SOS Filter and just do a PR for that?

Lastly, can you provide a little context for what is coming immediately after this PR? Will it include the "elbow detection", and is that work already completed or still in progress?

The next PR is up to this tag: https://github.com/Klipper3d/klipper/compare/master...garethky:klipper:pr-probe-load-cell The bulk of it is in this commit which has all of the elbow detection stuff: https://github.com/Klipper3d/klipper/commit/5f5061e9d3b113634b49d6a8810f37318b41490d On a clean probe this code is working very well and has been verified on a number of different test printers already.

I'm still working through the last few details around plug-in tap classifiers and nozzle scrubbing. There is a basic set of checks that can classify a tap as provably unusable, but there is a blurry line between that and truly accurate. This can be evaluated with heuristics and/or machine learning (in Prusa's case its a decision tree classifier). I'm experimenting with both approaches.

Will there be changes to the existing mcu code, changes to existing config parameters, changes to the API interface, or similar?

No more MCU changes. The dump_tap endpoint gets extended with all of the processing results from doing the elbow finding. Several config parameters are added to control the pullback move and to handle tap classification and nozzle scrubbing.

Doc update commit: https://github.com/Klipper3d/klipper/commit/eb69cdf96c3b1e5b71071e2b51401f52fcab93d9

Apr 04 '25 05:04 garethky

For the load_cell_probe.py code my main feedback is that it is going to be very difficult to try to utilize the "homing" interface.

I've read the Discourse post about the homing routines, and I can pretty much agree on what was said. But I also think that this becomes a wider refactor which significantly increases the scope of this PR's goals. The open hardware for load cell probing is being wrapped up right now, and we're looking at about a 3 month window until release. We are so close with the software, and it's really exciting to see progress in this space being made. It would be silly to delay it, subject to the refactor taking place.

Perhaps as a thought, this PR could be split into two - probing and homing? If the two are cleanly separate from one another, swapping out the underlying homing implementation later should be trivial. Marking the old homing hacks as deprecated, imo, should only happen when the new system is in place to replace it.

Apr 04 '25 06:04 EllaFoxo

This is a show stopper. ... If I have to give up endstop compatibility I think the only path forward now is to wait until after you re-write homing to support this.

I understand, but I think it would be preferable to try to move forward so that we aren't waiting for a future "homing rewrite".

The code works because non of that is checked or enforced (maybe some kinematics break this assumption?).

Short answer is that if you want to pause the toolhead, you need to do that in probe_prepare() and not home_start(). You can take a look at how bltouch.py uses probe_prepare() to deploy the sensor, or look at how probe.py uses probe_prepare() to run arbitrary g-code.

Long answer is that the general homing code needs to be able to home multiple steppers to multiple endstops simultaneously, and if one of the callbacks doesn't use the same timing it'll mess that up badly. It's possible you've found a way to work around that locally, but I'm not confident you wont run into weird errors. It would also be a maintenance issue to have one special module using a different scheme than all the other modules.

I'll follow up in another message with some recommendations on how one can integrate.

No more MCU changes. ...

Okay, thanks. I'll try to look through that information.

-Kevin

Apr 04 '25 23:04 KevinOConnor

There are two main internal probing interfaces: the probe command interface and the homing interface. The probe command interface is used for PROBE type commands (including PROBE_ACCURACY, BED_MESH_CALIBRATE and similar), while the homing interface is used by G28.

The probe command interface involves session = pprobe.start_probe_session(), session.run_probe(), session.pull_probed_results(), and session.end_probe_session(). You can see this in action by looking at the probe.py:run_single_probe() command.

The homing interface to probing involves multi_probe_begin(), probe_prepare(), home_start(), home_wait(), probe_finish(), and multi_probe_end(). This interface is more complex (and less flexible) because it is designed for homing to endstops, and in particular simultaneous homing of different carriages to different endstops.

I can make a few suggestions on how to integrate the loadcell code today.

Use just the probing command interface

This interface provides the most flexibility and has the fewest caveats. Basically, on each run_probe() invocation you can move the toolhead however you like - just store the results of the probe process for later retrieval via pull_probed_results(). Unfortunately, there is no good way to hook this up to G28 .

One could, however, still be able to home via the probe using something like:

[gcode_macro SET_Z_FROM_PROBE]
gcode:
    {% set pcf = printer.configfile.settings["probe_eddy_current eddy"] %}
    SET_KINEMATIC_POSITION Z={ printer.toolhead.position[2] - (printer.probe.last_z_result - pcf.z_offset) }

[gcode_macro HOME_MY_Z]
gcode:
    SET_KINEMATIC_POSITION Z={printer.toolhead.axis_maximum[2]}
    PROBE
    SET_Z_FROM_PROBE

Just as long as one is careful to home the Z using the custom HOME_MY_Z module, then it should be possible to use this for testing. Alas, this doesn't play well with modules like [safe_z_home] which is designed to issue G28 commands. So, I understand if this isn't an option.

Use just the homing interface

This system should work fine for your current implementation, which appears to just set the Z position based on the trigger time from the mcu. If you want to do this, then I recommend you implement just the six main calls listed above and let the probe.HomingViaProbeHelper/probe.ProbeSessionHelper classes translate the probing command interface to these six calls. (In the master branch, you also have to implement a probing_move() call, but with #6879 that isn't needed.)

The disadvantage of this system is that the probe position is determined by the mcu trigger time - both for G28 homing and for PROBE type commands - and it will be very difficult to change that. So, it'll get you up and running with the code in this PR, but you'll not be in a good position to add future code on top of it.

Implement both the homing interface and the probe command interface

This option is more complicated. It does allow for the use of G28 and it will allow for the use of high precision probing with the PROBE type commands. (A G28 Z0 still wont be any more precise than mcu trigger time and I don't know of a way to do that without a refactor of the homing code.)

If you want to go this route, then I suggest clearly separating the two interfaces. For example, have a class that implements the 4 basic probe command calls, and have a separate class that implements the 6+ homing via probe interface calls. I made a similar distinction in probe_eddy_current.py as part of #6879 .

Let me know what you think. -Kevin

Apr 05 '25 00:04 KevinOConnor

It does allow for the use of G28 and it will allow for the use of high precision probing with the PROBE type commands. (A G28 Z0 still wont be any more precise than mcu trigger time and I don't know of a way to do that without a refactor of the homing code.)

I actually have a very similar macro to the one wrote already because G28 doesn't home as accurately as PROBE. A macro is portable and pretty easy to set up for most people. (Asking them to attach physical hardware is the deal breaker)

I thought I needed to home the axis first because PROBE can fail. But separately, I'm working on adding a LOAD_CELL_TAP command that's like a single probe that doesn't throw errors. This was intended for controlling data collection. Single taps in a PROBE that fail invoke a nozzle scrubber routine with arbitrary gcode (this is in the next PR). That isn't valid to do when the z rail isn't homed.

But I could home the rail with this LOAD_CELL_TAP command, taking either the precise result of TapAnalysis or the MCU timing result (PROBE & LOAD_CELL_TAP always produce one of those 2 results). Then I can look at the TapAnalysis result and if it wasn't valid, invoke PROBE to re-try homing with nozzle cleaning enabled.

[gcode_macro _HOME_Z_FROM_PROBE]
gcode:
    SET_KINEMATIC_POSITION Z={ printer.toolhead.position[2] - (printer.probe.last_z_result) }

[gcode_macro _HOME_Z_FROM_TAP]
gcode:
    _HOME_Z_FROM_PROBE
    {% if not printer.probe.last_tap_is_valid %}
        # tap failed, try again with probing which will clean the nozzle
        PROBE
        _HOME_Z_FROM_PROBE
    {% endif %}

[gcode_macro _HOME_Z]
gcode:
    SET_KINEMATIC_POSITION Z={printer.toolhead.axis_maximum[2]}
    LOAD_CELL_TAP
    _SET_Z_FROM_TAP

(In my testing I had to add a move after the PROBE or the printer.toolhead.position[2] was off by the pullback move length, so it may be slightly more complex than this)

That would get this out of the G28 business. In the best case it homes the machine accurately with a single tap. Let me see what testers think.

Apr 05 '25 05:04 garethky

That would get this out of the G28 business. In the best case it homes the machine accurately with a single tap. Let me see what testers think.

This is okay

Apr 05 '25 07:04 EllaFoxo

Maybe just "load_cell.c" or "load_cell_tap.c"?

"load_cell_probe.c" ?

Apr 09 '25 20:04 garethky

Maybe just "load_cell.c" or "load_cell_tap.c"?

"load_cell_probe.c" ?

Sounds okay to me.

-Kevin

Apr 10 '25 00:04 KevinOConnor

As mentioned previously, I've "mocked up" some trsync.c code for verifying sensors remain active. It is at: https://github.com/KevinOConnor/klipper-dev/tree/work-trsync-20250411

I'm not sure using the trsync "report timer" mechanism is the best way to accomplish sensor checking, but maybe the code will give some ideas.

-Kevin

Apr 12 '25 17:04 KevinOConnor

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

Apr 27 '25 00:04 github-actions[bot]

Thanks. I rebased and committed this PR. I made some small modifications to the commit messages (no code changes) and merged as commits cb0c38f7, 42c9031c, 69507a03, 3dbac01e, b3e894f2, f6d878a8, and 388fe1b2.

I also made a couple of minor code fixes on top - commit eb43b20e and 8d7e4871.

Thanks again, -Kevin

May 29 '25 23:05 KevinOConnor

This ticket is being closed because the underlying issue is now thought to be resolved.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

Jun 07 '25 00:06 github-actions[bot]