nagios-plugins-linux icon indicating copy to clipboard operation
nagios-plugins-linux copied to clipboard

[Support] Information about `check_memory` vendored into Debian's `nagios-plugins-contrib`

Open amotl opened this issue 2 years ago • 6 comments

Dear Davide,

first things first: Thanks a stack for conceiving and maintaining this collection of excellent check programs.

Introduction

Because this is absolutely not a bug report related to your software at all, please apologize for not using the issue template. It is merely a report where we want to share some observations and caveats when operating Icinga on Debian machines and upgrading to newer releases like Debian bullseye. In this manner, we are humbly asking you to label this issue with info, support or similar and close it right away if you see fit.

Observations

It looks like the nagios-plugins-linux Debian source package recently vendored your monitoring programs ^1, effectively replacing its previous check_memory program. It happened with those commits:

@Flowdalic already reported that to the Debian bugtracker at #1005908:

In version 35 check_memory is a Perl script and the arguments are the low-water mark of free memory, after which an warning or critical result will be reported. In version 37 check_memory is an ELF and the arguments are the high-water mark of used memory after which a warning or critical result will be reported. [...] Hence, invoking the program with the previous parameters will yield different results depending on the installed version. This is clearly not desirable and would require that users need to adjust their monitoring setup to work around the behavior change.

So, effectively, for Debian users transitioning to nagios-plugins-contrib >= 36.20211207, the check semantics will be inverted. This is a somewhat unfortunate change for them.

Mitigations

As it is not really a bug report (your software works perfectly fine at all!) but more a support recipe for the community, our main aim is to share the steps we took to mitigate this problem on an Icinga instance monitoring systems running both old and new versions.

So, to all people using nagios-plugins-contrib or monitoring-plugins-contrib on Debian: The check_memory program by @madrisan has the --available option which should be used in this context of making the program's behavior compatible with the previous variant written in Perl. However, the program's --warning|--critical threshold control options will also need slight adjustments.

Effectively, if that command would have worked before:

check_memory --warning "20%" --critical "10%"

people should now use an invocation like that:

check_memory --available --warning "20%:" --critical "10%:"

Closing words

We hope you appreciate our report here despite the report should go to Debian instead. We will check how to respond to the original ticket at #1005908 properly, referencing this issue. If someone from the community can also do that, we will be more than happy.

With kind regards, Andreas.

amotl avatar Mar 29 '22 22:03 amotl

After knowing about the different call signature, the challenge was to adjust the Icinga configuration appropriately to support both variants. As we didn't find a way to inquire the information at runtime which package version would be installed, like

$ dpkg-query --show --showformat '${Package} ${Version}\n' monitoring-plugins-contrib
monitoring-plugins-contrib 37.20220312

we decided to introduce and propagate another flag to from the host object to the service object and further to the command object. Based on this flag, the call signature to the check_memory program is adjusted at runtime.

Hereby, we are sharing the corresponding configuration snippets. Hopefully, they will be helpful to all people who are in need of them.

object Host "foobar.example.org" {

    vars.os = "Debian"
    vars.kernel = "Linux"

    // Newer shipments of `check_memory` have a different call signature.
    // Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
    vars.memory_madrisan = true
}
apply Service "Main memory" {

  check_command = "memory"
  assign where host.vars.kernel == "Linux"
  ignore where host.vars.check_memory == false

  // Define thresholds.
  vars.warning = host.vars.memory_warning || "20%"
  vars.critical = host.vars.memory_critical || "10%"

  // Newer shipments of `check_memory` have a different call signature.
  // Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
  vars.madrisan = host.vars.memory_madrisan
  if (vars.madrisan) {
    vars.warning  += ":"
    vars.critical += ":"
  }
}
object CheckCommand "memory" {
    /*
    Universal memory check command accounting for different 
    variants of »check_memory« installed on the system.

    Newer shipments of `check_memory` have a different call signature.
    Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
    */
    import "plugin-check-command"

    command = [ "/usr/lib/nagios/plugins/check_memory" ]

    arguments = {
        "--available" = {
            description = "display the free/available memory"
            set_if = "$madrisan$"
        }
        "--warning" = {
            value = "$warning$"
            set_if = {{ if ( len(macro("$warning$")) != 0 ) { return 1 } else { return 0 } }}
            description = "Trigger warning when available memory is lower than X"
        }
        "--critical" = {
            value = "$critical$"
            set_if = {{ if ( len(macro("$critical$")) != 0 ) { return 1 } else { return 0 } }}
            description = "Trigger critical when available memory is lower than X"
        }
    }

}

amotl avatar Mar 29 '22 22:03 amotl

Another option would have been not to swap in the program under the same name ^1 while retiring the old one. In this case, the program would be installed as /usr/lib/nagios/check_madrisan_memory and could have been handled a bit better in transitioning scenarios. However, this suggestion is really meant to be addressed to the Debian maintainers.

amotl avatar Mar 29 '22 22:03 amotl

Dear @Flowdalic,

we can see that you acknowledged our original post, thank you. Can we humbly ask you to forward the relevant information to the Debian issue tracker if you can find some minutes? Maybe it will be enough to refer to this ticket?

Thank you in advance and with kind regards, Andreas.

amotl avatar Mar 30 '22 10:03 amotl

Thanks for your report @amotl I was not aware of this commit in Debian :+1: So, please tell me if I can do something for enhancing the compatibility with the previous Debian package. Perhaps a compilation switch?

madrisan avatar Mar 30 '22 16:03 madrisan

Dear Davide,

thank you for your kind reply on this issue. We just pinged @bzed and @waja about it at https://github.com/bzed/pkg-nagios-plugins-contrib/issues/93, trying to condense the relevant information further.

We would like to have them and Peter Palfrader have their verdict what would be best for the Debian vendoring as we are not accustomed to the corresponding details. However, we wanted to share our thoughts on the two different aspects which make the new implementation of check_memory incompatible with the old one, from the perspective of an average Debian user, and how to improve the situation slightly.

1. Threshold value evaluation inversion

Perhaps a compilation switch?

We don't know about a good solution to the aspect of the --available option yet. While a compilation switch would be able to essentially invert the operation of your check_memory program, wouldn't that be very confusing for users in the long run to find your program installed on their machines in two different variants? Currently, we see two options.

1.1. Messy workaround

Maybe it will be sufficient to advise adding this snippet to the CheckCommand configuration section?

"--available" = {
    description = "display the free/available memory"
    set_if = "$madrisan$"
}

However, people who are not aware of this will experience "silent corruption".

1.2. Break things instead of screwing things up

Like mentioned at https://github.com/madrisan/nagios-plugins-linux/issues/104#issuecomment-1082445888, a safer option would be to adjust the retirement commit https://github.com/bzed/pkg-nagios-plugins-contrib/commit/eadc7856 to ship the program as /usr/lib/nagios/check_madrisan_memory like before, while still removing the old Perl implementation. At least this will signal users that something has changed instead of screwing up the sensor integrity.

2. Threshold value syntax

The other aspect are the additional : colon characters on the threshold values, currently needed by @madrisan's implementation. We compensated for this with that snippet of Icinga configuration initially shared at https://github.com/madrisan/nagios-plugins-linux/issues/104#issuecomment-1082439499:

  // Newer shipments of `check_memory` have a different call signature.
  // Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
  vars.madrisan = host.vars.memory_madrisan
  if (vars.madrisan) {
    vars.warning  += ":"
    vars.critical += ":"
  }

please tell me if I can do something for enhancing the compatibility

If it would fit the semantics of your program, would it be possible to make those colon characters optional? In this manner, we could reduce the footprint of the workaround at least slightly.

With kind regards, Andreas.

amotl avatar Mar 30 '22 18:03 amotl

As described in the official nagios documentation, the two syntaxes are not equivalent: 20: means less than 20 while 20 would mean in our context greater than 20. Note also that the range is parsed by the function parse_range_string which is the same coded in the nagios plugins. So I cannot make the : optional.

madrisan avatar Apr 02 '22 22:04 madrisan

Hii, I am in the same boat, migration hosts from Debian 11 to Debian 12, and now the memory checks fail. But, with a little ternary shell logic things can work on both versions.

Instead of this:

/usr/lib/nagios/plugins/check_memory -w 20% -c 10%

we now use this:

/usr/lib/nagios/plugins/check_memory "$(( $(lsb_release -rs) < 12 )) && echo -w 20% -c 10% || echo -a -w 20%: -c 10%:"

We use the check_by_ssh wrapper, so in reality the command_line looks like this:

 command_line /usr/lib/nagios/plugins/check_by_ssh -q -t 60 -H '$HOSTADDRESS$' '/usr/lib/nagios/plugins/check_memory "$(( $(lsb_release -rs) < 12 )) && echo -w $ARG1$ -c $ARG2$ || echo -a -w $ARG1$: -c $ARG2$:"'

dnmvisser avatar Sep 26 '23 12:09 dnmvisser

Hi, thanks for the report. So weird. I'll check ASAP.

madrisan avatar Sep 26 '23 12:09 madrisan

Can you open a new issue please?

madrisan avatar Sep 26 '23 12:09 madrisan

Can you open a new issue please?

My comment/workaround was more meant for reference by others that start using Debian 12 and end up in the same situation. The problem is not so much with this/your script, but rather with the choice of Debian to move from one script to another (incompatible) script.

dnmvisser avatar Sep 26 '23 13:09 dnmvisser

The problem is not so much with this/your script, but rather with the choice of Debian to move from one script to another (incompatible) script.

It's not incompatible, but you might need to parameter it in another way. And yes, it requires to adjust your configuration.

waja avatar Sep 26 '23 18:09 waja