liblognorm icon indicating copy to clipboard operation
liblognorm copied to clipboard

Question Apache log parser

Open greg-FR13 opened this issue 5 years ago • 2 comments

Hi All,

I am a little bit lost I am using the following rule :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Apache's Log 1 : ... XX.XX.XX.XX - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050 XX.XX.XX.XX - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985 XX.XX.XX.XX - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

#head -1 /var/log/httpd/my.access_log | /usr/bin/lognormalizer -r apache_access_log.rule -e json { "originalmsg": "XX.XXX.XXX.XXX - "" [29/Jul/2018:03:53:53 +0000] "GET /robots.txt HTTP/1.1" 404 985", "unparsed-data": "" }

The rule is working for other Apache's logs, my problem is present only when I have "" in the log.

How can I deal with %auth:word% and "" ?

Thank you for your help and support,

Regards,

greg-FR13 avatar Aug 01 '18 14:08 greg-FR13

Hello @greg-FR13 ,

Your rule does not match the logs you are posting, since there is no user agent and referrer part present in the log messages.

For your logs:

192.168.1.1 - "Tester" [29/Jul/2018:05:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985
192.168.1.1 - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

this rule matches :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to{"extradata":"]"}%] "%verb:word% %request:word% HTTP/%httpversion:float{"format":"number"}%" %response:number{"format":"number"}% %blob:rest%

and when you run:

lognormalizer  -H -p -r apache.rule  < apache.log

it produces the following results:

{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:05:15:47 +0000", "auth": "\"Tester\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:06:15:47 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "985", "response": 404, "httpversion": 1.1, "request": "\/robots.txt", "verb": "GET", "timestamp": "29\/Jul\/2018:07:09:05 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "405", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:08:20:39 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }

In order to include user agent and referrer parts then you have 2 options:

  1. Either provide another rule with a higher priority than the aforementioned in the %response rule field.
  2. Enhance the existing rule with an alternative parser.

Keep in mind that liblognorm rules are not regular expressions. They produce Directed Acyclic Graphs (DAG) and the rules are handled in a different way than you may think by the parser . For more information please refer to official documentation.

Best regards,
Christos

manios avatar Sep 11 '18 06:09 manios

Hi @manios , Thank you for your complete answer; I will having a look.

Best,

greg-FR13 avatar Sep 14 '18 13:09 greg-FR13