rsyslog-doc icon indicating copy to clipboard operation
rsyslog-doc copied to clipboard

Extend documentation for re_extract function

Open deoren opened this issue 7 years ago • 0 comments

Current state of doc

Current description of function arguments:

re_extract(expr, re, match, submatch, no-found)

Current description:

extracts data from a string (property) via a regular expression match. POSIX ERE regular expressions are used. The variable “match” contains the number of the match to use. This permits to pick up more than the first expression match. Submatch is the submatch to match (max 50 supported). The “no-found” parameter specifies which string is to be returned in case when the regular expression is not found. Note that match and submatch start with zero. It currently is not possible to extract more than one submatch with a single call.

Problem

While working with this function today I had a lot of trouble figuring out how to use it. I eventually figured it out, but mainly by trail and error.

The variable “match” contains the number of the match to use. This permits to pick up more than the first expression match.

Does this mean that you can have more than one match against the message? Alright, I think I understand that part.

Submatch is the submatch to match (max 50 supported).

What is a submatch? I'm used to the idea of group matches, so seeing that there was a match parameter I felt comfortable specifying I wanted to match say, the second group in a pattern.

Sample string: Server bk_postfix/relay5 is UP/READY (leaving forced maintenance). Sample regex (optimizations ignored): Server (.*)\\/(.*)

If I want the "relay5" string, I reference group 1 (where the counting starts at 0). Based on trial and error, it appears that I would call re_extract like so:

set $.tempvar = re_extract($msg, "Server (.*)\\/(.*), 0, 1, "failed match");

Why is using match 0, submatch 1 the correct thing to do in order to extract "relay5"?

The rest of the details for the re_extract function:

The “no-found” parameter specifies which string is to be returned in case when the regular expression is not found. Note that match and submatch start with zero. It currently is not possible to extract more than one submatch with a single call.

If not for mention of submatch, I would have thought that simply referencing group 1 like this would have worked:

set $.tempvar = re_extract($msg, "(.*)([0-9]+)(.*)", 1, "failed match");

What to change

While I know that the holes in my regex knowledge is mostly to blame, I think that adding some basic examples of where there is just one match (with a few submatches) and at least one example of where there are multiple matches with a submatch that we're trying to match.

deoren avatar Feb 16 '18 23:02 deoren