hub icon indicating copy to clipboard operation
hub copied to clipboard

traefik access log parser fails

Open kumy opened this issue 3 years ago • 3 comments
trafficstars

Hi,

I have logs from traefik 1.7 that fails to be parsed

207.xx.xx.xx - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms

I've checked a bit what happening here and found that pattern for TRAEFIK_ROUTER is too much restrictive https://github.com/crowdsecurity/hub/blob/7f5129b9a38eac06670e303be167ea7e817889cd/parsers/s01-parse/crowdsecurity/traefik-logs.yaml#L8

While checking the referenced CLF format, I don't see any description of what each field value could take. So I wonder why this one accept only such pattern :shrug: .

Changing the line making user optional, fix the issue with cscli explain

TRAEFIK_ROUTER: '((%{USER}@)?%{URIHOST}|\-)'
cscli explain -l '207.0.1.2 - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms' --type traefik
line: 207.0.1.2 - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms
	├ s00-raw
	|	├ 🟢 crowdsecurity/non-syslog (first_parser)
	|	└ 🔴 crowdsecurity/syslog-logs
	├ s01-parse
	|	├ 🔴 crowdsecurity/sshd-logs
	|	└ 🟢 crowdsecurity/traefik-logs (+22 ~2)
	├ s02-enrich
	|	├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
	|	├ 🟢 crowdsecurity/geoip-enrich (+13)
	|	├ 🟢 crowdsecurity/http-logs (+7)
	|	└ 🟢 crowdsecurity/whitelists (unchanged)
	├-------- parser success 🟢
	├ Scenarios

I would like to create a PR but that break many unit tests.

kumy avatar Aug 31 '22 12:08 kumy

Out of curiosity, which tests did it break? it might help to move forward on this, I don't see any reason we should block this :)

buixor avatar Sep 27 '22 15:09 buixor

@buixor I did such change:

$ git df
diff --git a/.tests/traefik_clf/traefik_clf.log b/.tests/traefik_clf/traefik_clf.log
index ddc887f..ac7255b 100644
--- a/.tests/traefik_clf/traefik_clf.log
+++ b/.tests/traefik_clf/traefik_clf.log
@@ -1,6 +1,6 @@
 172.17.0.1 - - [08/Dec/2021:09:16:05 +0000] "GET /scripts/cpshost.dll HTTP/1.1" 200 414 "-" "-" 500 "test@docker" "http://172.17.0.3:80" 0ms
 172.17.0.1 - - [08/Dec/2021:09:16:05 +0000] "GET /upload.asp?test=toto&tata=test HTTP/1.1" 200 405 "-" "-" 502 "test@docker" "http://172.17.0.3:80" 0ms
 172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /cgi.cgi/ HTTP/1.1" 200 352 "-" "Nikto" 240 "test@docker" "http://172.17.0.3:80" 0ms
-172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352 "-" "Nikto" 242 "test@docker" "http://172.17.0.3:80" 1ms
+172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352 "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80" 1ms
 
 

Then:

$ cscli hubtest run traefik_clf
INFO[27-09-2022 06:07:52 PM] Running test 'traefik_clf'                   

ERRO[27-09-2022 06:07:56 PM] Parser test 'traefik_clf' failed (10 errors) 
(L.20)  🔴  => results["s00-raw"]["crowdsecurity/non-syslog"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
        Actual expression values:
            results["s00-raw"]["crowdsecurity/non-syslog"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
  "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
  1ms'

(L.123)  🔴  => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

(L.126)  🔴  => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
        Actual expression values:
            results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
  "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
  1ms'

(L.134)  🔴  => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Meta["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

(L.229)  🔴  => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

(L.241)  🔴  => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
  "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
  1ms'

(L.245)  🔴  => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Meta["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

(L.366)  🔴  => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
  "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
  1ms'

(L.382)  🔴  => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

(L.391)  🔴  => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Meta["traefik_router_name"] == "test@docker"
        Actual expression values:
            results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'

? 
Do you want to remove runtime folder for test 'traefik_clf'? (default: Yes) No
---------------------
 TEST         RESULT 
---------------------
 traefik_clf  ❌     
---------------------

kumy avatar Sep 27 '22 16:09 kumy

I had the same issue with traefik v1.7 This patch worked for me, but I did not test it with traefik > v2:

diff --git a/parsers/s01-parse/crowdsecurity/traefik-logs.yaml b/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
index 6022b0f..c91bc9f 100644
--- a/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
+++ b/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
@@ -5,10 +5,11 @@ filter: "evt.Parsed.program startsWith 'traefik'"
 #debug: true
 onsuccess: next_stage
 pattern_syntax:
-  TRAEFIK_ROUTER: '(%{USER}@%{URIHOST}|\-)'
-  TRAEFIK_SERVER_URL: '(%{URI}|\-)'
+  TRAEFIK_ROUTER: '(%{USER}@%{URIHOST}|%{NOTDQUOTE})'
+  TRAEFIK_SERVER_URL: '(%{URI}|%{NOTDQUOTE})'
   NUMBER_MINUS: '[0-9-]+'
-  NGINXACCESS2: '%{IPORHOST:remote_addr} - %{NGUSER:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER_MINUS:status} %{NUMBER_MINUS:body_bytes_sent} "%{NOTDQUOTE:http_referer}" "%{NOTDQUOTE:http_user_agent}"'
+  NGUSER2: '(?:%{NGUSER}\s)?'
+  NGINXACCESS2: '%{IPORHOST:remote_addr} - %{NGUSER2:remote_user}- \[%{HTTPDATE:time_local}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER_MINUS:status} %{NUMBER_MINUS:body_bytes_sent} "%{NOTDQUOTE:http_referer}" "%{NOTDQUOTE:http_user_agent}"'
 nodes:
   - grok: # CLF parser
       pattern: '%{NGINXACCESS2} %{NUMBER:number_of_requests_received_since_traefik_started} "%{TRAEFIK_ROUTER:traefik_router_name}" "%{TRAEFIK_SERVER_URL:traefik_server_url}" %{NUMBER:request_duration_in_ms}ms'

Please let me know if you want a pull request.

Adphi avatar Oct 10 '22 14:10 Adphi