chef-utils icon indicating copy to clipboard operation
chef-utils copied to clipboard

[chefctl] Add a new hook: `skip_run?`

Open jaymzh opened this issue 1 month ago • 5 comments

This adds a new hook point (and sample plugin usage) that allows the Chef run to be skipped based on some local criteria.

Example usage might be:

  • Device is on battery
  • Device is not connected to VPN/backhaul/etc.
  • Some global service meant to disable runs during an emergency

Previously I did this in pre_run or pre_start, but the problem with that is that the only way is to force exit, which causes the logs to get messed up because we never update the links. This provides a clean way to skip the run but still update the chef.{cur,last} links so that it's clear what has happened.

Sample output:

$ sudo chefctl -iv
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Loading plugin at /etc/cinc/chefctl_hooks.rb.
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Including registered plugin KrHook
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Trying lock /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Lock acquired: /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Skippinbg battery check due to --immediate flag

and

$ sudo chefctl
[2025-11-13 18:27:22 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:22 +1000] WARN chefctl: Running on battery power, skipping Chef run
[2025-11-13 18:27:22 +1000] INFO chefctl: Plugin requested skipping chef run.

Signed-off-by: Phil Dibowitz [email protected]

jaymzh avatar Nov 13 '25 08:11 jaymzh

Well that has some risks because you don't necessarily want cron to think it failed.

I have two possible compromises that I'm cool with, but I much prefer the first.

option 1

The hook returns false (do the run), true (skip the run, return success), or - and if it's an int, we use that as the exit value. This allows the hook to decide how this should get handled.

option 2

We add a config option --error-on-skipped-run or some such, which forces it to exit with a pre-determined exit code

I think the first one is more flexible for all sorts of use-cases (you could have multiple return codes for WHY it failed, including some of them being 0)... but if there's a reason I'm missing to choose the second, I'm open to hearing it.

jaymzh avatar Nov 14 '25 07:11 jaymzh

@dafyddcrosby - ping? which would you prefer

jaymzh avatar Nov 25 '25 19:11 jaymzh

I think the first option should be fine. I think that as long as we're not in a position where no Chef run has actually happened for $period_of_time and we have no way of determining it was skipped, should be good.

dafyddcrosby avatar Dec 01 '25 10:12 dafyddcrosby

Awesome, thanks, I'll modify accordingly.

jaymzh avatar Dec 01 '25 21:12 jaymzh

OK, option 1 implemented. Plus a few typos in comments fixed.

jaymzh avatar Dec 01 '25 22:12 jaymzh