varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

suspect unbuffered VSL regression

Open nigoroll opened this issue 2 years ago • 3 comments

Expected Behavior

good VSL

Current Behavior

garbled VSL:

$ tail -20 tests/permutations.log 
**** v1    vsl|       1024 Debug           c vped_include: enter
**** v1    vsl|       1024 Debug           c vped_include: new req=0x7fe0f87508e0
**** v1    vsl|       1024 Debug           c node_insert: node=0x7fe0fdaa4020 parent=0x7fe0f4a0aab0 parent->state=2 parent->npending_private=2 npending=0
**** v1    vsl|       1024 Debug           c vped_include: new xid=1720
**** v1    vsl|       1024 Debug           c vped_include: preq->vdc=0x7fe0f4a0a5d0 nxt=0x7fe0f4a0b938
**** v1    vsl|       1024 Debug           c vped_include: attempt new thread req=0x7fe0f87508e0
**** v1    vsl|       1024 Debug           c vped_vdp: vped_include()=1
**** v1    vsl|       1024 Debug           c node_insert: node=0x7fe0f4a0aee8 parent=0x7fe0f4a0aa38 parent->state=2 parent->npending_private=11 npending=0
**** v1    vsl|       1024 Debug           c vped_vdp: pecx->state=4
**** v1    vsl|       1024 Debug           c vped_vdp: call vped_include incl_cont=0
**** v1    vsl|       1024 Debug           c vped_include: enter
**** v1    vsl|       1024 Debug           c vped_include: new req=0x7fe0f871e120
**** v1    vsl|       1024 Debug           c vped_include: new xid=1723
**** v1    vsl| 1255008291940703 (null)          c : attempt new thread req=0x7fe0f871e120
**** v1    vsl| 281685430108208 (null)          - 
**** v1    vsl| 1533167834003301 (null)          c d req=0x7fe0ef3e2d20
*    diag  0.0 Assert error in vslc_vsm_next(), vsl_cursor.c line 143:
*    diag  0.0   Condition((t) != 0) not true. (errno=0 Success)
#    top  TEST ./tests/permutations.vtc FAILED (2.489) signal=6
FAIL tests/permutations.vtc (exit status: 2)

Possible Solution

No response

Steps to Reproduce (for bugs)

No response

Context

Running VTCs of https://gitlab.com/uplex/varnish/libvdp-pesi with -DDEBUG makes the code issue a lot of unbuffered VSL

Varnish Cache version

a84b3c9416890a3f32e25ce4f6d6c87e40b10c79

Operating system

Linux 6.0.0-4-amd64

Source of binary packages used (if any)

No response

nigoroll avatar Jun 12 '23 17:06 nigoroll

I've spent some time with this today and noticed two things on the way, which are most likely not the root cause here: Regarding the garbled vxids, I first thought that the cause could be f7767df01b0ab17d242c4eeb4d92389a42bfae06 which apparently was overlooked with the 64bit change. But firstly, that does not explain the garbled tag (null) in the VSL output, and secondly the issue is still reproducible with this change. Some thoughts:

  • Could this be an overrun? Unlikely I would say because the varnishd default is 80m and the vtc is running with -b 20m

  • Could this be an old issue which only surfaces with unbuffered VSL? Probably yes, I would have to try to reproduce it with old code

  • Could this have to do with the fact that our VSL header went from 2 to 3 32bit words, so it is not a multiple of 64bit any more?

  • Could this be some kind of spurious read / spill?

nigoroll avatar Jun 13 '23 17:06 nigoroll

bugwash: it could be an overrun because vtc_varnish uses -l 2m. TODO: check for overruns.

nigoroll avatar Jun 19 '23 13:06 nigoroll

I've reset vsl_space back to the 80m default and checked that the setting is applied. The problem persists.

***  v1    CLI RX  200
**** v1    CLI RX|vsl_space
**** v1    CLI RX|        Value is: 80M [bytes] (default)
**** v1    CLI RX|        Minimum is: 1M
**** v1    CLI RX|        Maximum is: 4G
...
**** v1    vsl|       2731 Debug           c vped_include: new xid=3100
**** v1    vsl|       2731 Debug           c vped_include: preq->vdc=0x7f51db2e6ed0 nxt=0x7f51db2e8240
**** v1    vsl|       2731 Debug           c vped_include: attempt new thread req=0x7f51d73b5020
**** v1    vsl| 1533137702105185 (null)          c t->state=2 parent->npending_private=2 npending=0
**** v1    vsl| 281848638865408 (null)          - 
*    diag  0.0 Assert error in vslc_vsm_next(), vsl_cursor.c line 143:
*    diag  0.0   Condition((t) != 0) not true. (errno=0 Success)
#    top  TEST ./tests/permutations.vtc FAILED (3.870) signal=6
FAIL tests/permutations.vtc (exit status: 2)

Reminder: To reproduce, compile with -DDEBUG

nigoroll avatar Jun 26 '23 19:06 nigoroll

FTR: the test case in question does not exercise h2, so any h2 concurrency issues should not be related to the cause

nigoroll avatar Apr 29 '24 13:04 nigoroll

Triggered by bugwash, I think I found the root cause in pESI: https://gitlab.com/uplex/varnish/libvdp-pesi/-/commit/32b2be11f3696f9c33908c89d990e16e4f99e135

nigoroll avatar Apr 29 '24 13:04 nigoroll