ha_cluster_exporter icon indicating copy to clipboard operation
ha_cluster_exporter copied to clipboard

corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line

Open jamesyu558 opened this issue 4 years ago • 23 comments

Hi Support,

The following corosync parser error on the "Node ID" exists on the v1.2.0. So I upgraded the ha_cluster_exporter from v1.2.0 to the latest version v.1.2.1 on my RHEL7 VM. But unfortunately, this error still exists on v1.2.1.

The error message is and noticed that the field name complained by corosync is "Node ID": msg="'corosync' collector scrape failed: corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line"

See below:

# corosync-quorumtool
Quorum information
------------------
Date:             Thu Apr  8 09:55:31 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          1/568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         1          1 XXXXXXXXXX
         2          1 XXXXXXXXXX (local)

Can you please help?

jamesyu558 avatar Apr 08 '21 14:04 jamesyu558

I'm not able to reproduce this issue: your example matches the regex we're using to parse quorumtool output. What's the output of ha_cluster_exporter --version?

stefanotorresi avatar Apr 08 '21 14:04 stefanotorresi

Here it is:

cd /var/lib/pacemaker_exporter/

ls -l

total 18436 -rwxr-xr-x. 1 postgres postgres 9437184 Apr 6 08:37 ha_cluster_exporter-amd64

./ha_cluster_exporter-amd64 --version

version 1.2.1+git.1606912430.4fceb77 built with go1.15.5 linux/amd64 2020-12-02T17:30:26+00:00

jamesyu558 avatar Apr 08 '21 15:04 jamesyu558

IF you have a debug module, I should be able to install it and see exactly what happened to this parser error. Please let me know if more information you need from me.. Really appreciate your help!!!

jamesyu558 avatar Apr 08 '21 15:04 jamesyu558

in my environment, I have pacemaker installed as well, together with this prometheus exporter installed for Grafana...

jamesyu558 avatar Apr 08 '21 15:04 jamesyu558

Nope, we don't have a debug module. I guess the best shot you have is to download the sources and run it with a step debugger to inspect what input is being actually fed to the regex here: https://github.com/ClusterLabs/ha_cluster_exporter/blob/4fceb77b3a195bbce12f54e23569a66e20f50bc3/collector/corosync/parser.go#L85-L93

Btw, what corosync version you're using?

stefanotorresi avatar Apr 08 '21 16:04 stefanotorresi

hold on let me check

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

corosync -v

Corosync Cluster Engine, version '2.4.3' Copyright (c) 2006-2009 Red Hat, Inc.

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

How exactly to debug this on RHEL7? Do you have a specific steps to set it up?

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

You could clone the project and then use https://github.com/go-delve/delve to debug it, but that assumes some familiarity with the Go language and toolkit!

stefanotorresi avatar Apr 08 '21 16:04 stefanotorresi

Thanks...I can figure this out. I let you know soon what value of "quorumToolOutput" is passed over to this function....Thank you again.

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?

yes, you could also do that by adding

log.Debug(string(quorumToolOutput)) 

after line 85

stefanotorresi avatar Apr 08 '21 16:04 stefanotorresi

even better...thx

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

Will get back to you tomorrow morning this time....

jamesyu558 avatar Apr 08 '21 16:04 jamesyu558

We modified that function like this: func parseNodeId(quorumToolOutput []byte) (string, error) { nodeRe := regexp.MustCompile((?m)Node ID:\s+(\w+)) matches := nodeRe.FindSubmatch(quorumToolOutput) var x = string(quorumToolOutput) if matches == nil { return "", errors.New("could NOT find Node ID line :" + x) } return string(matches[1]), nil }

Then in the log, we see this: could not parse node id in corosync-quorumtool output: could NOT find Node ID line :"

Notice that we changed "not" to "NOT" in purpose and see if the code can take out changes.... Looks like the x variable is an empty space....

Any more ideas?

jamesyu558 avatar Apr 09 '21 15:04 jamesyu558

Hello, is there any update about this issue?

borisjacquot avatar Dec 28 '22 08:12 borisjacquot

I need an example output from corosync-quorumtool to reproduce the issue. That is, an output that doesn't correctly match the (?m)Node ID:\s+(\w+) regular expression. You can verify that yourself at https://regex101.com/r/riyToT/1. As you can see, the example provided by OP matches correctly, so I don't know what's up there.

Until I get an actual example, there is not much I can do.

stefanotorresi avatar May 02 '23 16:05 stefanotorresi

Hello @stefanotorresi i've the same issue, here is the output :

ha_cluster_exporter time="2023-05-02T18:37:54Z" level=warning msg="Corosync Collector scrape failed: could not parse ring id and seq number in corosync-quorumtool output: could not find Ring ID line"

Quorum information
------------------
Date:             Tue May  2 18:38:03 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          2.4a46b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1  
Flags:            2Node Quorate LastManStanding 

Membership information
----------------------
    Nodeid      Votes Name
         2          1 lb-int01.xxx.yyy.zzz (local)
         3          1 lb-int02.xxx.yyy.zzz

Issue on Debian 11

adr1enb avatar May 02 '23 18:05 adr1enb

hmm, ok, that does match the regex, so it's not helping me either: https://regex101.com/r/JuhDCK/1

stefanotorresi avatar May 03 '23 17:05 stefanotorresi

oh, by the way, please always report the versions of the exporter and corosync you're using.

stefanotorresi avatar May 03 '23 17:05 stefanotorresi

Here it is :

corosync 3.1.2-2 ha_cluster_exporter-1.0.1

I've just updated to 1.3.2, it seems fixed :thinking:

adr1enb avatar May 10 '23 08:05 adr1enb

tl;dr: if that can help anyone, make sure you test running corosync-quorumtool with same user as the one your ha_cluster_exporter process runs under and that it does work indeed under that user.


./ha-cluster-exporter --version
ha_cluster_exporter, version 1.3.3+git.1683650163.1000ba6 (branch: HEAD, revision: 1000ba696a5ef85737f70808a12e5a01bee5c281)
  build user:       runner@fv-az1100-952
  build date:       20230529-08:55:18
  go version:       go1.20.4
  platform:         linux/amd64
  tags:             netgo
$ corosync-quorumtool
Cannot initialize CMAP service

In this case (unprivileged user) and I guess in other cases, corosync-quorumtool exits with exit code 1 which is ignored as per this comment. stdout is empty hence the failure to find a node ID and stderr contains that error. The fix here was to make sure the user has the proper permissions for corosync-quorumtool not to fail.

I guess a possible improvement would be ignoring the return code as is currently done but also failing when stdout is empty and stderr is not, since that might indicate failure of the command itself?

frazew avatar Oct 27 '23 16:10 frazew

failing when stdout is empty and stderr is not, since that might indicate failure of the command itself

That's a good suggestion! We'll see to implement this tweak in the next iteration.

stefanotorresi avatar Feb 19 '24 17:02 stefanotorresi