suspect unbuffered VSL regression
Expected Behavior
good VSL
Current Behavior
garbled VSL:
$ tail -20 tests/permutations.log
**** v1 vsl| 1024 Debug c vped_include: enter
**** v1 vsl| 1024 Debug c vped_include: new req=0x7fe0f87508e0
**** v1 vsl| 1024 Debug c node_insert: node=0x7fe0fdaa4020 parent=0x7fe0f4a0aab0 parent->state=2 parent->npending_private=2 npending=0
**** v1 vsl| 1024 Debug c vped_include: new xid=1720
**** v1 vsl| 1024 Debug c vped_include: preq->vdc=0x7fe0f4a0a5d0 nxt=0x7fe0f4a0b938
**** v1 vsl| 1024 Debug c vped_include: attempt new thread req=0x7fe0f87508e0
**** v1 vsl| 1024 Debug c vped_vdp: vped_include()=1
**** v1 vsl| 1024 Debug c node_insert: node=0x7fe0f4a0aee8 parent=0x7fe0f4a0aa38 parent->state=2 parent->npending_private=11 npending=0
**** v1 vsl| 1024 Debug c vped_vdp: pecx->state=4
**** v1 vsl| 1024 Debug c vped_vdp: call vped_include incl_cont=0
**** v1 vsl| 1024 Debug c vped_include: enter
**** v1 vsl| 1024 Debug c vped_include: new req=0x7fe0f871e120
**** v1 vsl| 1024 Debug c vped_include: new xid=1723
**** v1 vsl| 1255008291940703 (null) c : attempt new thread req=0x7fe0f871e120
**** v1 vsl| 281685430108208 (null) -
**** v1 vsl| 1533167834003301 (null) c d req=0x7fe0ef3e2d20
* diag 0.0 Assert error in vslc_vsm_next(), vsl_cursor.c line 143:
* diag 0.0 Condition((t) != 0) not true. (errno=0 Success)
# top TEST ./tests/permutations.vtc FAILED (2.489) signal=6
FAIL tests/permutations.vtc (exit status: 2)
Possible Solution
No response
Steps to Reproduce (for bugs)
No response
Context
Running VTCs of https://gitlab.com/uplex/varnish/libvdp-pesi with -DDEBUG makes the code issue a lot of unbuffered VSL
Varnish Cache version
a84b3c9416890a3f32e25ce4f6d6c87e40b10c79
Operating system
Linux 6.0.0-4-amd64
Source of binary packages used (if any)
No response
I've spent some time with this today and noticed two things on the way, which are most likely not the root cause here:
Regarding the garbled vxids, I first thought that the cause could be f7767df01b0ab17d242c4eeb4d92389a42bfae06 which apparently was overlooked with the 64bit change. But firstly, that does not explain the garbled tag (null) in the VSL output, and secondly the issue is still reproducible with this change.
Some thoughts:
-
Could this be an overrun? Unlikely I would say because the varnishd default is 80m and the vtc is running with
-b 20m -
Could this be an old issue which only surfaces with unbuffered VSL? Probably yes, I would have to try to reproduce it with old code
-
Could this have to do with the fact that our VSL header went from 2 to 3 32bit words, so it is not a multiple of 64bit any more?
-
Could this be some kind of spurious read / spill?
bugwash: it could be an overrun because vtc_varnish uses -l 2m.
TODO: check for overruns.
I've reset vsl_space back to the 80m default and checked that the setting is applied. The problem persists.
*** v1 CLI RX 200
**** v1 CLI RX|vsl_space
**** v1 CLI RX| Value is: 80M [bytes] (default)
**** v1 CLI RX| Minimum is: 1M
**** v1 CLI RX| Maximum is: 4G
...
**** v1 vsl| 2731 Debug c vped_include: new xid=3100
**** v1 vsl| 2731 Debug c vped_include: preq->vdc=0x7f51db2e6ed0 nxt=0x7f51db2e8240
**** v1 vsl| 2731 Debug c vped_include: attempt new thread req=0x7f51d73b5020
**** v1 vsl| 1533137702105185 (null) c t->state=2 parent->npending_private=2 npending=0
**** v1 vsl| 281848638865408 (null) -
* diag 0.0 Assert error in vslc_vsm_next(), vsl_cursor.c line 143:
* diag 0.0 Condition((t) != 0) not true. (errno=0 Success)
# top TEST ./tests/permutations.vtc FAILED (3.870) signal=6
FAIL tests/permutations.vtc (exit status: 2)
Reminder: To reproduce, compile with -DDEBUG
FTR: the test case in question does not exercise h2, so any h2 concurrency issues should not be related to the cause
Triggered by bugwash, I think I found the root cause in pESI: https://gitlab.com/uplex/varnish/libvdp-pesi/-/commit/32b2be11f3696f9c33908c89d990e16e4f99e135