loki
loki copied to clipboard
CEF (Common Event Format) parser in logql
Is your feature request related to a problem? Please describe. CEF (Common Event Format) is a logging format used by some tools. Loki and promtail should have a built in parser for it so that I can extract CEF fields as labels. CEF messages look something like:
CEF:0|Palo Alto Networks|Cortex XDR|Cortex XDR 2.4|XDR Analytics|High Connection Rate|6|end=1601792870694 shost=WGHRAMG deviceFacility=None cat=Discovery externalId=98106342 request=https:\/\/iga-bh.xdr.eu.paloaltonetworks.com\/alerts\/98106342 cs1=iexplore.exe cs1Label=Initiated by cs2=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 cs2Label=Initiator CMD cs3=Microsoft CorporationSIGNATURE_SIGNED- cs3Label=Signature cs4=iexplore.exe cs4Label=CGO name cs5=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 cs5Label=CGO CMD cs6=Microsoft CorporationSIGNATURE_SIGNED- cs6Label=CGO Signature dst=10.12.4.37 dpt=8000 src=10.10.28.140 spt=58003 fileHash=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 filePath=C:\\\\Program Files (x86)\\\\Internet Explorer\\\\iexplore.exe targetprocesssignature=NoneSIGNATURE_UNAVAILABLE- tenantname=iGA tenantCDLid=1021319191 CSPaccountname=Information & eGovernment Authority initiatorSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 initiatorPath=C:\\\\Program Files (x86)\\\\Internet Explorer\\\\iexplore.exe cgoSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 osParentName=iexplore.exe osParentCmd=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 osParentSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 osParentSignature=SIGNATURE_SIGNED osParentSigner=Microsoft Corporation incident=118719 act=Detected suser=['root']
i.e. a pipe delimited format; where the last field is a series of key/value pairs with certain escaping rules
Describe the solution you'd like
Similar to LogQL parsers json and logfmt, we should have a parser cef
Additional context
My first use-case is to parse logs from Cortex XDR which are sent as CEF payloads over syslog (see https://docs.paloaltonetworks.com/cortex/cortex-xdr/cortex-xdr-pro-admin/logs/cortex-xdr-log-notification-formats/alert-notification-format.html)
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
Please label revivable?
still valid
The basic structure of the CEF format can be parsed using the pattern parser, like this: pattern CEF:
Example:
{filename="/var/log/cef.log"} |= "cat=Discovery" | pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`
However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.
However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.
At least with the logs coming from Cortex XDR, all the interesting things to alert on are in the extension section. We'd like to extract them to add to alerts generated from loki ruler.
However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.
At least with the logs coming from Cortex XDR, all the interesting things to alert on are in the
extensionsection. We'd like to extract them to add to alerts generated from loki ruler.
That's what I thought. I guess a regex guru may be able to write an expression for parsing the key value pairs, but it would be very inefficient. :(
If you don't need to parse all key value pairs of the extension field, it may be relatively easy:
{filename="/var/log/cef.log"}
|= "cat=Discovery"
| pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`
| label_format original=`CEF:{{.v}}|{{.vendor}}|{{.product}}|{{.version}}|{{.signature}}|{{.name}}|{{.severity}}|{{.extension}}`
| line_format `{{.extension}}`
| regexp `end=(?P<end>[^\s]+)`
| end > 1600000000000
| line_format `{{.original}}`
And since you want to alert on certain things, I assume you don't even need the last line_format stage, because you're doing a metrics query:
sum by (shost) (
count_over_time(
{filename="/var/log/cef.log"}
|= "cat=Discovery"
| pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`
| line_format `{{.extension}}`
| regexp `end=(?P<end>[^\s]+)`
| regexp `shost=(?P<shost>[^\s]+)`
| end > 1600000000000
[$__interval]
)
)
Do not blame me on performance ;-)
Even that doesn't help with the un-escaping process.
Though I think it does show why a dedicated cef parser would be useful :)
Though I think it does show why a dedicated
cefparser would be useful :)
Agree, a dedicated parser would be more useful. If we wanna do this, we'll have some thoughts on whether integrating a parser in LogQL or in Promtail, so it can be transformed into a different format.
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
we'll have some thoughts on whether integrating a parser in LogQL or in Promtail, so it can be transformed into a different format.
@chaudum any further thoughts on this?
Is there any progress in making a CEF parser?
I tried searching around the internet for some CEF specification, but I soft-failed... There are some vendors that kind of explain the formatting, but some actual specification/RFC would help a lot