ModSecurity icon indicating copy to clipboard operation
ModSecurity copied to clipboard

feat: fix regression test, add a new CI workflow

Open airween opened this issue 2 months ago • 10 comments

what

This PR fixes the last broken regression test and adds the whole regression test workflow to GH CI.

why

The regression tests were almost finished, now I found another broken test. After fixed that I prepared a new workflow for GH CI, where we run these regression tests.

I switched the operation system to Ubuntu 24.04 (from 22.04). In Ubuntu 24.04 the old PCRE (libpcre3) is not installed by default, I had to add it explicitly.

I tried to follow the build matrix as it's in other workflows, but a complete regression test takes about 29 minutes - I think that's very slow. On my build machine (which is not a strong one) it takes about 6-7 minutes.

Unfortunately we can't fix this, the reason is that the operation of a test: each tests creates a config, start an Apache server with that config, send a request and check the output (files, response, etc).

I don't know if it's worth it.

references

#3425

airween avatar Oct 31 '25 20:10 airween

I looked at run-regression-tests.pl.in quickly and it seems to me that match_log might be the issue. Log flushing can differ significantly between systems, so it's possible that searching the log in each test takes a long time. I suggest you add some debug print statements to the Perl script so that you can see in the CI output which operations use a lot of time.

theseion avatar Nov 01 '25 08:11 theseion

I tried to review the script again, and fixed two small things in bbe5239:

  • switched to select() from sleep()
  • clear the buffer which hold the read log

Also I removed the test regression/misc/10-tfn-cache.t, because it took many time (on my machine the whole test file was 1m 6secs). The reason is that it seems SecCacheTransformations directive is deprecated.

Until I write this comment, the tests has started, but it seems that no performance improvement. (And beside of that there are two failed tests...)

airween avatar Nov 07 '25 19:11 airween

Okay, I think the reason is visible now but unknown: see the output of the last test. The Apache stop time is terrible slow.

And I don't know why...

@theseion, @dune73 do you have any idea?

airween avatar Nov 07 '25 20:11 airween

If I remember correctly, Apache waits for connections to be closed or whatever. I do not think it sends a RST or something, but it really waits. So if it's slow to stop, I would look at the connection table during the stopping.

Stopping Apache in production is always slow. In a test situation that's not ideal. Maybe you want to kill it instead.

Take a look at my script https://www.netnea.com/files/apachex that I use for control of Apache in my lab. It's a very rough tool, more a mallet than a screwdriver, but it never fails me to kill and restore apache instantly.

Not sure this is the right path, but that's what I have in mind.

dune73 avatar Nov 07 '25 22:11 dune73

There are probably two things you can do:

  1. Test request should be sent with Connection: close
  2. Kill httpd

The first is just a nice-to-have of course. I'd absolutely simply kill httpd. There's no reason for handling open connections gracefully in this scenario.

theseion avatar Nov 08 '25 07:11 theseion

Thanks for ideas,

1. Test request should be sent with `Connection: close`

All requests send Connection: close header (I checked with tcpdump), eg:

10:55:19.058098 IP localhost.omniorb > localhost.33008: Flags [P.], seq 1:225, ack 144, win 512, options [nop,nop,TS val 2093795869 ecr 2093795869], length 224
E...~d@.@..}............J..................
|...|...HTTP/1.1 200 OK
Date: Sat, 08 Nov 2025 09:55:19 GMT
Server: Apache/2.4.58 (Ubuntu)
Last-Modified: Sat, 08 Nov 2025 09:46:10 GMT
ETag: "5-643122b0cc4e7"
Accept-Ranges: bytes
Content-Length: 5
Connection: close

TEST

2. Kill httpd

I tried, but then many other weird issues came out. Eg. when httpd_start() function is called, eg:

Loaded 2 tests from /home/airween/src/modsecurity-apache/tests/regression/config/00-load-modsec.t
  1) config - module loaded: passed
Httpd start failed with error messages:
httpd (pid 10748) already running

The first is just a nice-to-have of course. I'd absolutely simply kill httpd. There's no reason for handling open connections gracefully in this scenario.

I'm afraid I have to reorganize the complete start/stop mechanism.

airween avatar Nov 08 '25 10:11 airween

I'm afraid I have to reorganize the complete start/stop mechanism.

How are you killing the process(es)? It looks like you did kill and then start_httpd() immediately, so your requests start being processed before the old process has been killed. Maybe you need to put a waitpid in there.

theseion avatar Nov 09 '25 14:11 theseion

I'm afraid I have to reorganize the complete start/stop mechanism.

How are you killing the process(es)?

here is the current routine which stops httpd.

This section builds and runs the command:

    my @p = (
        $HTTPD,
        -d => $opt{S},
        -f => $opt{C},
        (map { (-c => $_) } ("Listen localhost:$opt{p}", @_)),
        -k => "stop",
    );

    my $httpd_out;
    my $httpd_pid = open3(undef, $httpd_out, undef, @p) or quit(1);

@p holds the command and open3() runs that. It's approximately httpd -k stop.

It looks like you did kill and then start_httpd() immediately, so your requests start being processed before the old process has been killed. Maybe you need to put a waitpid in there.

There is a waitipd() already..

Edit: it seems like the waitpid() waits for terminating of httpd -k stop command. That takes a lot of time.

airween avatar Nov 09 '25 15:11 airween

Edit: it seems like the waitpid() waits for terminating of httpd -k stop command. That takes a lot of time.

Yes, exactly. I was thinking of kill + waitpid (if necessary).

theseion avatar Nov 09 '25 15:11 theseion