hub
hub copied to clipboard
traefik access log parser fails
Hi,
I have logs from traefik 1.7 that fails to be parsed
207.xx.xx.xx - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms
I've checked a bit what happening here and found that pattern for TRAEFIK_ROUTER is too much restrictive
https://github.com/crowdsecurity/hub/blob/7f5129b9a38eac06670e303be167ea7e817889cd/parsers/s01-parse/crowdsecurity/traefik-logs.yaml#L8
While checking the referenced CLF format, I don't see any description of what each field value could take. So I wonder why this one accept only such pattern :shrug: .
Changing the line making user optional, fix the issue with cscli explain
TRAEFIK_ROUTER: '((%{USER}@)?%{URIHOST}|\-)'
cscli explain -l '207.0.1.2 - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms' --type traefik
line: 207.0.1.2 - - [31/Aug/2022:12:19:15 +0000] "POST /xmlrpc.php HTTP/1.1" 200 230 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64rv:95.0) Gecko/20100101 Firefox/95.0" 4155 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.23.1.8:80" 47ms
├ s00-raw
| ├ 🟢 crowdsecurity/non-syslog (first_parser)
| └ 🔴 crowdsecurity/syslog-logs
├ s01-parse
| ├ 🔴 crowdsecurity/sshd-logs
| └ 🟢 crowdsecurity/traefik-logs (+22 ~2)
├ s02-enrich
| ├ 🟢 crowdsecurity/dateparse-enrich (+2 ~1)
| ├ 🟢 crowdsecurity/geoip-enrich (+13)
| ├ 🟢 crowdsecurity/http-logs (+7)
| └ 🟢 crowdsecurity/whitelists (unchanged)
├-------- parser success 🟢
├ Scenarios
I would like to create a PR but that break many unit tests.
Out of curiosity, which tests did it break? it might help to move forward on this, I don't see any reason we should block this :)
@buixor I did such change:
$ git df
diff --git a/.tests/traefik_clf/traefik_clf.log b/.tests/traefik_clf/traefik_clf.log
index ddc887f..ac7255b 100644
--- a/.tests/traefik_clf/traefik_clf.log
+++ b/.tests/traefik_clf/traefik_clf.log
@@ -1,6 +1,6 @@
172.17.0.1 - - [08/Dec/2021:09:16:05 +0000] "GET /scripts/cpshost.dll HTTP/1.1" 200 414 "-" "-" 500 "test@docker" "http://172.17.0.3:80" 0ms
172.17.0.1 - - [08/Dec/2021:09:16:05 +0000] "GET /upload.asp?test=toto&tata=test HTTP/1.1" 200 405 "-" "-" 502 "test@docker" "http://172.17.0.3:80" 0ms
172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /cgi.cgi/ HTTP/1.1" 200 352 "-" "Nikto" 240 "test@docker" "http://172.17.0.3:80" 0ms
-172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352 "-" "Nikto" 242 "test@docker" "http://172.17.0.3:80" 1ms
+172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352 "-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80" 1ms
Then:
$ cscli hubtest run traefik_clf
INFO[27-09-2022 06:07:52 PM] Running test 'traefik_clf'
ERRO[27-09-2022 06:07:56 PM] Parser test 'traefik_clf' failed (10 errors)
(L.20) 🔴 => results["s00-raw"]["crowdsecurity/non-syslog"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
Actual expression values:
results["s00-raw"]["crowdsecurity/non-syslog"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
"-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
1ms'
(L.123) 🔴 => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
Actual expression values:
results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
(L.126) 🔴 => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
Actual expression values:
results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
"-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
1ms'
(L.134) 🔴 => results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Meta["traefik_router_name"] == "test@docker"
Actual expression values:
results["s01-parse"]["crowdsecurity/traefik-logs"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
(L.229) 🔴 => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
Actual expression values:
results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
(L.241) 🔴 => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
Actual expression values:
results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
"-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
1ms'
(L.245) 🔴 => results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Meta["traefik_router_name"] == "test@docker"
Actual expression values:
results["s02-enrich"]["crowdsecurity/dateparse-enrich"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
(L.366) 🔴 => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["message"] == "172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] \"GET /index?toto=tata HTTP/1.1\" 200 352 \"-\" \"Nikto\" 242 \"test@docker\" \"http://172.17.0.3:80\" 1ms"
Actual expression values:
results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["message"] = '172.17.0.1 - - [08/Dec/2021:13:59:39 +0000] "GET /index?toto=tata HTTP/1.1" 200 352
"-" "Nikto" 242 "Host-domain-com-www-domain-com-wp-domain-com-0" "http://172.17.0.3:80"
1ms'
(L.382) 🔴 => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["traefik_router_name"] == "test@docker"
Actual expression values:
results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Parsed["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
(L.391) 🔴 => results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Meta["traefik_router_name"] == "test@docker"
Actual expression values:
results["s02-enrich"]["crowdsecurity/http-logs"][3].Evt.Meta["traefik_router_name"] = 'Host-domain-com-www-domain-com-wp-domain-com-0'
?
Do you want to remove runtime folder for test 'traefik_clf'? (default: Yes) No
---------------------
TEST RESULT
---------------------
traefik_clf ❌
---------------------
I had the same issue with traefik v1.7 This patch worked for me, but I did not test it with traefik > v2:
diff --git a/parsers/s01-parse/crowdsecurity/traefik-logs.yaml b/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
index 6022b0f..c91bc9f 100644
--- a/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
+++ b/parsers/s01-parse/crowdsecurity/traefik-logs.yaml
@@ -5,10 +5,11 @@ filter: "evt.Parsed.program startsWith 'traefik'"
#debug: true
onsuccess: next_stage
pattern_syntax:
- TRAEFIK_ROUTER: '(%{USER}@%{URIHOST}|\-)'
- TRAEFIK_SERVER_URL: '(%{URI}|\-)'
+ TRAEFIK_ROUTER: '(%{USER}@%{URIHOST}|%{NOTDQUOTE})'
+ TRAEFIK_SERVER_URL: '(%{URI}|%{NOTDQUOTE})'
NUMBER_MINUS: '[0-9-]+'
- NGINXACCESS2: '%{IPORHOST:remote_addr} - %{NGUSER:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER_MINUS:status} %{NUMBER_MINUS:body_bytes_sent} "%{NOTDQUOTE:http_referer}" "%{NOTDQUOTE:http_user_agent}"'
+ NGUSER2: '(?:%{NGUSER}\s)?'
+ NGINXACCESS2: '%{IPORHOST:remote_addr} - %{NGUSER2:remote_user}- \[%{HTTPDATE:time_local}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER_MINUS:status} %{NUMBER_MINUS:body_bytes_sent} "%{NOTDQUOTE:http_referer}" "%{NOTDQUOTE:http_user_agent}"'
nodes:
- grok: # CLF parser
pattern: '%{NGINXACCESS2} %{NUMBER:number_of_requests_received_since_traefik_started} "%{TRAEFIK_ROUTER:traefik_router_name}" "%{TRAEFIK_SERVER_URL:traefik_server_url}" %{NUMBER:request_duration_in_ms}ms'
Please let me know if you want a pull request.