go-ftw icon indicating copy to clipboard operation
go-ftw copied to clipboard

Race condition in phase 5

Open theseion opened this issue 3 years ago • 10 comments

Tests for rules in phase 5 are subject to a rare race condition, as exhibited in the following log for tests 980170-2 and 980170-3 (line breaks between rules added for readability):

[Sat Mar 18 16:47:21.474075 2023] [security2:error] [pid 193:tid 140523746522880] [client 172.18.0.1:39150] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "04709bb3-6b2b-477d-aa14-76a89363b2dd"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg7QAAANg"]
[Sat Mar 18 16:47:21.476378 2023] [security2:error] [pid 193:tid 140524082149120] [client 172.18.0.1:39164] [client 172.18.0.1] ModSecurity: Warning. Pattern match "(?:^([\\\\d.]+|\\\\[[\\\\da-f:]+\\\\]|[\\\\da-f:]+)(:[\\\\d]+)?$)" at REQUEST_HEADERS:Host. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-920-PROTOCOL-ENFORCEMENT.conf"] [line "761"] [id "920350"] [msg "Host header is a numeric IP address"] [data "127.0.0.1"] [severity "WARNING"] [ver "OWASP_CRS/4.0.0-rc1"] [tag "modsecurity"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-protocol"] [tag "paranoia-level/1"] [tag "OWASP_CRS"] [tag "capec/1000/210/272"] [tag "PCI/6.5.10"] [hostname "127.0.0.1"] [uri "/"] [unique_id "ZBXrGVXqtKqlnATxdUEg7gAAAMQ"]
[Sat Mar 18 16:47:21.480771 2023] [security2:error] [pid 193:tid 140524098930432] [client 172.18.0.1:39172] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "04709bb3-6b2b-477d-aa14-76a89363b2dd"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg7wAAAMM"]


[Sat Mar 18 16:47:21.483110 2023] [security2:error] [pid 35:tid 140523931117312] [client 172.18.0.1:39180] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "c50b1e8e-e8a5-46d1-afc8-bf82368390f4"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGcs3WOXFKxhOjQ9D-AAAAE0"]
[Sat Mar 18 16:47:21.483333 2023] [security2:error] [pid 193:tid 140524082149120] [client 172.18.0.1:39164] [client 172.18.0.1] ModSecurity: Warning. Unconditional match in SecAction. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "96"] [id "980170"] [msg "Anomaly Scores: (Inbound Scores: blocking=3, detection=3, per_pl=3-0-0-0, threshold=5) - (Outbound Scores: blocking=0, detection=0, per_pl=0-0-0-0, threshold=4) - (SQLI=0, XSS=0, RFI=0, LFI=0, RCE=0, PHPI=0, HTTP=0, SESS=0, COMBINED_SCORE=3)"] [ver "OWASP_CRS/4.0.0-rc1"] [tag "modsecurity"] [tag "reporting"] [hostname "127.0.0.1"] [uri "/"] [unique_id "ZBXrGVXqtKqlnATxdUEg7gAAAMQ"]
[Sat Mar 18 16:47:21.489998 2023] [security2:error] [pid 193:tid 140523830429440] [client 172.18.0.1:39206] [client 172.18.0.1] ModSecurity: Warning. Pattern match "^.*$" at REQUEST_HEADERS:X-CRS-Test. [file "/etc/modsecurity.d/owasp-crs/crs-setup.conf"] [line "737"] [id "999999"] [msg "c50b1e8e-e8a5-46d1-afc8-bf82368390f4"] [tag "modsecurity"] [hostname "localhost"] [uri "/status/200"] [unique_id "ZBXrGVXqtKqlnATxdUEg8AAAANM"]

Rule 920350 is triggered as expected for the first test but rule 980170 is not. However, looking at the unique_id field it becomes clear that rule 980170 did in fact trigger but was reported as part of the second rule (the msg field of rule 999999 denotes start and stop ID of a rule).

The logic for finding triggered rules in the log is:

  1. send test request to mark start
  2. run test
  3. send test request to mark end
  4. while end marker not in log
    1. send test request to flush log
  5. check test against lines between start and end markers

Apparently, the log entry for rule 980170 may under some circumstances be written to the log much later and, which is worse, independently of other log entries of the same phase (both 999999 and 980170 run in phase 5), otherwise it would have shown up before the start marker for test 980170-3.

theseion avatar Mar 18 '23 19:03 theseion

Is there an easy fix for this?

fzipi avatar Mar 29 '23 12:03 fzipi

Not that I'm aware of. I didn't dive deeply though.

theseion avatar Mar 29 '23 18:03 theseion

One thing that I see in common there is the unique_id. Maybe we can use it?

fzipi avatar May 24 '23 22:05 fzipi

Maybe... Does that work reliably?

theseion avatar May 25 '23 05:05 theseion

This happened again and caused an automated test run to fail on the CRS repo the other day. So, this definitely remains an issue.

RedXanadu avatar Aug 14 '23 22:08 RedXanadu

We can't use unique_id because we don't know how many lines to expect.

I have an idea how to solve this though. Unfortunately, it will require replacing the current mechanism (maybe with a transition phase). The issue we are facing is that a test request will run in a thread / worker that is different from the marker requests. Hence, the marker requests may complete while processing of the test request is still ongoing. The fix thus has to be to use the same thread / worker as the test request.

For this to work we will need an additional rule that logs in phase 5 of every test request. Then we know that we can look for this special log entry to find the end of the log lines for a given test request. Essentially, the end marker we currently use would be replaced with a different end marker. The start marker can still be used because there's no race condition in that case.

I'm pretty sure this will work, however, I'm not sure wether a drop action would skip phase 5. If that is the case, then we're pretty much screwed... @dune73?

theseion avatar Sep 17 '23 15:09 theseion

Ping @dune73.

theseion avatar Nov 05 '23 23:11 theseion

Sorry, I overlooked this so far.

Unfortunately, phase 5 is indeed skipped when the drop action is used. I suspect this is because the idea with drop is to free the thread immediately from the connection. I wonder how garbage collection works in this case ...

Either way, it's an unfortunate situation. But then I would not invest too much into phase 5. The only rule we use in phase 5 is 980170 (CRSv4) and it's not a very important rule. I would probably be pragmatic and test at reporting level 0 where the rule is inactive.

dune73 avatar Nov 06 '23 06:11 dune73

I have been encountering this problem more and more frequently over the past week or so, for what it's worth. Several PRs have been marked as 'failing tests' in error.

RedXanadu avatar Nov 09 '23 09:11 RedXanadu

Thanks. I'm working on a workaround.

theseion avatar Nov 09 '23 10:11 theseion

retry_once has been implemented.

theseion avatar May 20 '24 14:05 theseion