varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

OSX: many tests fail unless autogen.sh is run

Open mjpieters opened this issue 3 years ago • 13 comments

Downloading the source tarbal from https://varnish-cache.org/releases/rel7.0.0.html#rel7-0-0, gives me a tarball that when compiled produces 165 test failures.

Expected Behavior

For the tests to pass after building.

Current Behavior

wget https://varnish-cache.org/_downloads/varnish-7.0.0.tgz
tar xzf varnish-7.0.0.tgz
cd varnish-7.0.0
./configure
make check
# ....
============================================================================
Testsuite summary for Varnish 7.0.0
============================================================================
# TOTAL: 795
# PASS:  600
# SKIP:  30
# XFAIL: 0
# FAIL:  165
# XPASS: 0
# ERROR: 0
============================================================================
See bin/varnishtest/test-suite.log
Please report to [email protected]
============================================================================

Possible Solution

Regenerating the build tooling solves this, mostly:

cd ..
rm -rf varnish-7.0.0
tar xzf varnish-7.0.0.tgz
cd varnish-7.0.0
./autogen.sh
./configure
make check
# ...
============================================================================
Testsuite summary for Varnish 7.0.0
============================================================================
# TOTAL: 795
# PASS:  766
# SKIP:  28
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See bin/varnishtest/test-suite.log
Please report to [email protected]
============================================================================

The remaining test failure is bin/varnishtest/tests/m00000.vtc; see the test-suite.log

Steps to Reproduce (for bugs)

See above.

Your Environment

  • Version used: 7.0.0
  • Operating System and version: CentOS Linux release 8.4.2105
  • Source of binary packages used (if any): CentOS rpms.

mjpieters avatar Oct 05 '21 19:10 mjpieters

And just like that, for no apparent reason, I am no longer able to reproduce this; I started a new, clean docker container and now everything Just Works. It's as if the issue is time sensitive. I found this first on MacOS, where the homebrew pre-built binary ("bottled") exhibits this exact behaviour. Building from source on MacOS had the same issues, until you run ./autogen.sh, then the test suite passes. I can still get the failures using the homebrew binaries, so the issue does produce a faulty binary. See my homebrew bug report for the failures.

I'll try again tomorrow.

mjpieters avatar Oct 05 '21 21:10 mjpieters

If we had the log with the 165 failures, we might be able to tell more about what could be going wrong. The single m00000.vtc failure seems to be related to vmods not being unloaded as expected

discarded   cold    busy         1    vcl8
active      auto    warm         0    vcl9

**** v1    vsl|          0 CLI             - Rd debug.vmod 
**** v1    vsl|          0 CLI             - Wr 200 163     1 std (/varnish-7.0.0/vmod/.libs/libvmod_std.so)
    1 debug (/varnish-7.0.0/vmod/.libs/libvmod_debug.so)
    2 vtc (/varnish-7.0.0/vmod/.libs/libvmod_vtc.so)

**** dT    3.433
**** v1    vsl|          0 CLI             - Rd ping
**** v1    vsl|          0 Debug           - vcl1: VCL_EVENT_COLD
**** v1    vsl|          0 CLI             - Wr 200 19 PONG 1633464632 1.0
**** dT    6.421
---- v1    Not true: vmods (3) == 0 (0)
*    top   RESETTING after ./tests/m00000.vtc

While we might want to look into that, it is nothing of concern.

Please do also note that MacOS is not a tier1 platform for us.

nigoroll avatar Oct 07 '21 14:10 nigoroll

I'll have to close this. I can only conclude I must've mixed up terminals. My apologies!

mjpieters avatar Oct 07 '21 17:10 mjpieters

FWIW, I am seeing this on macOS with 6.6.1 after running autogen. The tests are still running but the failures so far have been similar to

**** dT    0.000
*    top   TEST ./tests/b00037.vtc starting
**** top   extmacro def pwd=/Users/daves/opt/Downloads/varnish-6.6.1/bin/varnishtest
**** top   extmacro def localhost=127.0.0.1
**** top   extmacro def bad_backend=127.0.0.1:57409
**** top   extmacro def listen_addr=127.0.0.1:0
**** top   extmacro def bad_ip=192.0.2.255
**** top   extmacro def topbuild=/Users/daves/opt/Downloads/varnish-6.6.1
**** top   macro def testdir=/Users/daves/opt/Downloads/varnish-6.6.1/bin/varnishtest/./tests
**** top   macro def tmpdir=/var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409
**   top   === varnishtest "Error on multiple Host headers"
*    top   VTEST Error on multiple Host headers
**   top   === varnish v1 -vcl {backend be none;} -start
**** dT    0.016
**   v1    Launch
***  v1    CMD: cd ${pwd} && exec varnishd  -d -n /var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409/v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 57410' -P /var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409/v1/varnishd.pid -p vmod_path=/Users/daves/opt/Downloads/varnish-6.6.1/vmod/.libs 
***  v1    CMD: cd /Users/daves/opt/Downloads/varnish-6.6.1/bin/varnishtest && exec varnishd  -d -n /var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409/v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 57410' -P /var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409/v1/varnishd.pid -p vmod_path=/Users/daves/opt/Downloads/varnish-6.6.1/vmod/.libs 
***  v1    PID: 5754
**** v1    macro def v1_pid=5754
**** v1    macro def v1_name=/var/folders/48/hvbwyh4d2m5b78p_jqr4kjg0000231/T//vtc.5736.126c0409/v1
**** dT    0.061
***  v1    debug|Debug: Version: varnish-6.6.1 revision e6a8c860944c4f6a7e1af9f40674ea78bbdcdc66
***  v1    debug|Debug: Platform: Darwin,20.6.0,x86_64,-jnone,-sdefault,-sdefault,-hcritbit
***  v1    debug|200 308     
***  v1    debug|-----------------------------
***  v1    debug|Varnish Cache CLI 1.0
***  v1    debug|-----------------------------
***  v1    debug|Darwin,20.6.0,x86_64,-jnone,-sdefault,-sdefault,-hcritbit
***  v1    debug|varnish-6.6.1 revision e6a8c860944c4f6a7e1af9f40674ea78bbdcdc66
***  v1    debug|
***  v1    debug|Type 'help' for command list.
***  v1    debug|Type 'quit' to close CLI session.
***  v1    debug|Type 'start' to launch worker process.
***  v1    debug|
**** dT    0.153
**** v1    CLIPOLL 1 0x1 0x0
***  v1    CLI connection fd = 5
**** dT    0.154
***  v1    CLI RX  107
**** v1    CLI RX|hqlpnhfihjarrfgyaamgiiajmktfnyjo
**** v1    CLI RX|
**** v1    CLI RX|Authentication required.
**** dT    0.155
**** v1    CLI TX|auth 0bb4c161bf4157afbdfc97361d9949916dff201a50cad9ebe815a43c214e3de7
***  v1    CLI RX  200
**** v1    CLI RX|-----------------------------
**** v1    CLI RX|Varnish Cache CLI 1.0
**** v1    CLI RX|-----------------------------
**** v1    CLI RX|Darwin,20.6.0,x86_64,-jnone,-sdefault,-sdefault,-hcritbit
**** v1    CLI RX|varnish-6.6.1 revision e6a8c860944c4f6a7e1af9f40674ea78bbdcdc66
**** v1    CLI RX|
**** v1    CLI RX|Type 'help' for command list.
**** v1    CLI RX|Type 'quit' to close CLI session.
**** v1    CLI RX|Type 'start' to launch worker process.
**** dT    0.156
**** v1    CLI TX|vcl.inline vcl1 << %XJEIFLH|)Xspa8P
**** v1    CLI TX|vcl 4.1;
**** v1    CLI TX|backend be none;
**** v1    CLI TX|%XJEIFLH|)Xspa8P
**** dT    0.257
***  v1    vsl|No VSL chunk found (child not started ?)
**** dT    0.362
***  v1    vsl|No VSL chunk found (child not started ?)
**** dT    0.466
***  v1    vsl|No VSL chunk found (child not started ?)
**** dT    0.540
***  v1    CLI RX  200
**** v1    CLI RX|VCL compiled.
**** v1    CLI TX|vcl.use vcl1
**** dT    0.541
***  v1    CLI RX  200
**   v1    Start
**** v1    CLI TX|start
**** dT    0.551
***  v1    debug|Debug: Child (5764) Started
**** dT    0.588
***  v1    CLI RX  200
***  v1    wait-running
**** v1    CLI TX|status
***  v1    debug|Info: Child (5764) said Child starts
***  v1    CLI RX  200
**** v1    CLI RX|Child in state running
**** v1    CLI TX|debug.listen_address
***  v1    CLI RX  200
**** v1    CLI RX|a0 127.0.0.1 57411
**** v1    CLI TX|debug.xid 999
***  v1    CLI RX  200
**** v1    CLI RX|XID is 999
**** v1    CLI TX|debug.listen_address
**** dT    0.589
***  v1    CLI RX  200
**** v1    CLI RX|a0 127.0.0.1 57411
**   v1    Listen on 127.0.0.1 57411
**** v1    macro def v1_addr=127.0.0.1
**** v1    macro def v1_port=57411
**** v1    macro def v1_sock=127.0.0.1:57411
**** v1    macro def v1_a0_addr=127.0.0.1
**** v1    macro def v1_a0_port=57411
**** v1    macro def v1_a0_sock=127.0.0.1:57411
**** dT    0.677
**** v1    vsl|          0 CLI             - Rd vcl.load "vcl1" vcl_vcl1.1635192618.875749/vgc.so 1auto
**** v1    vsl|          0 CLI             - Wr 200 52 Loaded "vcl_vcl1.1635192618.875749/vgc.so" as "vcl1"
**** v1    vsl|          0 CLI             - Rd vcl.use "vcl1"
**** v1    vsl|          0 CLI             - Wr 200 0 
**** v1    vsl|          0 CLI             - Rd start
**** v1    vsl|          0 CLI             - Wr 200 0 
**** v1    vsl|          0 CLI             - Rd debug.listen_address 
**** v1    vsl|          0 CLI             - Wr 200 19 a0 127.0.0.1 57411

**** v1    vsl|          0 CLI             - Rd debug.xid 999 
**** v1    vsl|          0 CLI             - Wr 200 10 XID is 999
**** v1    vsl|          0 CLI             - Rd debug.listen_address 
**** v1    vsl|          0 CLI             - Wr 200 19 a0 127.0.0.1 57411

**** dT    0.690
**   top   === client c1 {
**   c1    Starting client
**   c1    Waiting for client
**   c1    Started on 127.0.0.1:57411 (1 iterations)
***  c1    Connect to 127.0.0.1:57411
***  c1    connected fd 15 from 127.0.0.1 57419 to 127.0.0.1:57411
**   c1    === txreq -hdr "Host: foo" -hdr "Host: bar"
**** c1    txreq|GET / HTTP/1.1\r
**** c1    txreq|Host: foo\r
**** c1    txreq|Host: bar\r
**** c1    txreq|\r
**   c1    === rxresp
**** c1    rxhdr|HTTP/1.1 400 Bad Request\r
**** c1    rxhdr|\r
**** c1    rxhdrlen = 28
**** c1    http[ 0] |HTTP/1.1
**** c1    http[ 1] |400
**** c1    http[ 2] |Bad Request
**** c1    bodylen = 0
**   c1    === expect resp.status == 400
**** c1    EXPECT resp.status (400) == "400" match
***  c1    closing fd 15
**   c1    Ending
**   top   === varnish v1 -expect client_req_400 == 1
**** dT    0.691
---- c1    Assert error in vsc_fill_point(), vsc.c line 240:  Condition(vt->type == VJSN_STRING) not true.  Errno=0 Undefined error: 0
*    top   RESETTING after ./tests/b00037.vtc
**   v1    Wait
**** v1    CLI TX|panic.clear
***  v1    CLI RX  300
**** v1    CLI RX|No panic to clear
***  v1    debug|Info: manager stopping child
***  v1    debug|Debug: Stopping Child
**** dT    0.778
**** v1    vsl|       1000 Begin           c sess 0 HTTP/1
**** v1    vsl|       1000 SessOpen        c 127.0.0.1 57419 a0 127.0.0.1 57411 1635192619.409869 25
**** v1    vsl|       1000 Link            c req 1001 rxreq
**** v1    vsl|       1001 Begin           c req 1000 rxreq
**** v1    vsl|       1001 Timestamp       c Start: 1635192619.410037 0.000000 0.000000
**** v1    vsl|       1001 Timestamp       c Req: 1635192619.410037 0.000000 0.000000
**** v1    vsl|       1001 HttpGarbage     c GET
**** v1    vsl|       1001 RespProtocol    c HTTP/1.1
**** v1    vsl|       1001 RespStatus      c 400
**** v1    vsl|       1001 RespReason      c Bad Request
**** v1    vsl|       1001 ReqAcct         c 40 0 40 28 0 28
**** v1    vsl|       1001 End             c 
**** v1    vsl|       1000 SessClose       c RX_JUNK 0.000
**** v1    vsl|       1000 End             c 
**** v1    vsl|          0 CLI             - EOF on CLI connection, worker stops
**** dT    1.696
***  v1    debug|Info: Child (5764) ended
**** dT    1.697
***  v1    debug|Info: Child (5764) said Child dies
***  v1    debug|Debug: Child cleanup complete
***  v1    debug|Info: manager dies
**** dT    1.699
**** v1    STDOUT EOF
**** dT    1.708
**   v1    WAIT4 pid=5754 status=0x0000 (user 0.269020 sys 0.107804)
*    top   TEST ./tests/b00037.vtc FAILED
#    top  TEST ./tests/b00037.vtc FAILED (1.710) exit=2

dave-shawley avatar Oct 25 '21 20:10 dave-shawley

FWIW, I am seeing this on macOS with 6.6.1 after running autogen. The tests are still running but the failures so far have been similar to

That appears to be a Homebrew build issue, see https://github.com/Homebrew/homebrew-core/issues/86516#issuecomment-955719219. When building from source without homebrew's build rig the tests pass:

# assumption: all varnish build dependencies have been installed
# install additional autotools dependency
brew install automake autoconf autoconf-archive libtool
dlfile=$(mktemp -t varnish).tgz
curl -sL https://github.com/varnishcache/varnish-cache/archive/refs/tags/varnish-7.0.0.tar.gz -o ${dlfile}
builddir=$(mktemp -dt varnishbuild)
cd ${builddir}
tar xzf ${dlfile} --strip-components=1
# fix for varnishcache/varnish-cache#3699, 7.0 fails to build on macos 10.15
curl -s https://github.com/varnishcache/varnish-cache/commit/20e007a5b17c1f68f70ab42080de384f9e192900.diff | patch -p1
./autogen.sh
./configure --disable-dependency-tracking --prefix=$(brew --prefix varnish) --localstatedir=$(brew --prefix)/var
make
make check  # runs the same tests

mjpieters avatar Oct 31 '21 15:10 mjpieters

FWIW, I am seeing this on macOS with 6.6.1 after running autogen. The tests are still running but the failures so far have been similar to

More detail on this: It's not a homebrew issue, but a libtool issue in combination with MacOS 11. The toolchain used to generate a varnish source tarball is.. older, running on CentOS 7, and doesn't include fixes needed to handle the change in major version. If you still see these errors even after running autogen, it means you also need to upgrade libtool.

You can also apply a patch to configure to fix this issue.

mjpieters avatar Nov 02 '21 16:11 mjpieters

We could consider bootstrapping release archives with the latest autotools components (or reasonably closer to latest) but macos is not a tier1 platform for us.

dridi avatar Nov 08 '21 07:11 dridi

We could consider bootstrapping release archives with the latest autotools components (or reasonably closer to latest) but macos is not a tier1 platform for us.

That MacOS is not tier1 is fine, I'm learning just enough about autotools to keep the homebrew packages in shape. :-) Updating the toolchain to something a bit more recent would be a huge help however, and not just for MacOS!

(and we use MacOS as a dev environment, not the production environment; we use it to validate VCL and vmod configurations)

mjpieters avatar Nov 08 '21 10:11 mjpieters

I'm not sure that the most recent libtool actually works with macOS 11. I expected that line 1079 of libtool.m4 to mention 11.*. I did tweak configure locally and the tests seem to pass (still running). I also do no use homebrew and I'm still tracking down where in the autoconf/automake/libtool toolchain the fix actually should be done at.

@mjpieters I would patch configure in so that the lt_cv_ld_force_load calculation contains:

      10.*|11.*)
        _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;

dave-shawley avatar Nov 08 '21 12:11 dave-shawley

@dave-shawley see the homebrew patch for configure; it doesn't match on 11, rather on darwin10.* and a branch for darwin*. There is a whole directory with different versions of the same at https://github.com/Homebrew/formula-patches/tree/master/libtool/

mjpieters avatar Nov 10 '21 01:11 mjpieters

I finally got both time to look into this and more importantly access to a macos system, and I was able to reproduce the 165 failing tests:

$ curl -o varnish-7.0.0.tgz https://varnish-cache.org/_downloads/varnish-7.0.0.tgz
$ tar xf varnish-7.0.0.tgz
$ cd varnish-7.0.0
$ ./configure
$ make -sj32 check

What is going wrong is provisions made by libtool based on the OS version that were a bit overreaching. I was able to work around this by simply disabling libtool system auto-detection:

$ make distclean
$ ./configure --build=arm-apple-darwin10.5
$ make -sj32 check

The rest of the test suite passes for me, except b00059.vtc that runs into a panic:

***  v1    debug|Error: Child (8372) Panic at: Mon, 25 Jul 2022 16:41:42 GMT
***  v1    debug|Assert error in Lck_CondWait(), cache/cache_lck.c line 248:
***  v1    debug|  Condition(errno == 0 || errno == ETIMEDOUT || errno == EINTR) not true.
***  v1    debug|version = varnish-7.0.0 revision 454733b82a3279a1603516b4f0a07f8bad4bcd55, vrt api = 14.0

We recently fixed this for the 50th final time in a53079e385abcfba4bfe88414a52e94c1dc444c5 so let's hope there won't be a 51th. Running the test case again (make recheck) succeeded.

The "automatic" parts of autotools are supposed to be overridable when they aren't future-proof enough and this one was no exception. I understand the inconvenience, but there should be no need to bootstrap new release archives for older releases, and for future releases we should add to the release notes procedure that all the steps up to make dist should be made on a system with latest autotools packages. It was the case for the 7.1.0 release and on the same macos system I have no problem with the following steps:

$ curl -o varnish-7.1.0.tgz https://varnish-cache.org/_downloads/varnish-7.1.0.tgz
$ tar xf varnish-7.1.0.tgz
$ cd varnish-7.1.0
$ ./configure
$ make -sj32 check

Keeping open until we agree to add this rule to the release procedure during a bugwash.

Apologies for the long delay.

*I made a few edits after noticing inconsistencies in my comment

dridi avatar Jul 25 '22 17:07 dridi

we should add to the release notes procedure that all the steps up to make dist should be made on a system with latest autotools packages.

I'm uncertain what you mean by "latest autotools packages". The latest GNU release of these tools? Or the latest packages available in the releng's distro? Or the latest in the latest release of a specific distro?

I'm guessing the problem disappearing from Varnish 7.0.0 to 7.1.0 coincides with an upgrade from Debian Buster to Debian Bullseye, and the new autotools packages that came as the result.

mbgrydeland avatar Aug 03 '22 12:08 mbgrydeland

I mean the latest packages available in the releng distro, we should definitely NOT install our own autotools suite.

And by that I mean making sure the releng distro is the "latest" of whatever we pick. Right now the CI dist job runs on centos:7, so the reason why 7.1 doesn't have this problem is that I personally bootstrapped it from fedora:latest, whatever it was then. In other words, fedora:latest was my releng distro for 7.1, and this step is under-defined in the release process.

I hope I didn't make it more confusing :)

dridi avatar Aug 03 '22 12:08 dridi

ETIMEDOUT

dridi avatar Oct 17 '22 13:10 dridi