Add (be)req.trace and feature +trace to control generation of VCL_trace logs
This PR is the implementation of the bugwash agreement in https://github.com/varnishcache/varnish-cache/pull/3836#issuecomment-1230335147
I am opening a new PR to properly document the performance implications of the alternative approaches. I ran the max potential benchmark from #3515 again on master, #3836, this branch, and one with branch predictions enabled. The results are given below as quantiles of Timestamp:Process times for 100000 requests:
My conclusion is that, while the tracing calls (not) compiled into VGC is still the fastest option, the runtime check from this PR is still substantially faster than the original code from master. If we think that a runtime tracing switch is an important feature (which we agreed upon), then this PR is the way to go.
master 886ab87c87bc7990be3c316feea1c1a117c5365d
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|---|---|---|
| 0.000383 | 0.000424 | 0.000442 | 0.000454 | 0.000461 | 0.000469 | 0.000477 | 0.000488 | 0.000515 |
this PR (be)req.trace with runtime check, no branch prediction hints https://github.com/nigoroll/varnish-cache/commit/0b41933926fc502872aa625086e93a9fd3ee9e23
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|---|---|---|
| 0.000201 | 0.000222 | 0.000233 | 0.000241 | 0.000247 | 0.000252 | 0.000258 | 0.000266 | 0.000280 |
this PR (be)req.trace with runtime check and branch prediction hints https://github.com/nigoroll/varnish-cache/commit/165db63002c455ef355e07d30ea40c515137bb46
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|---|---|---|
| 0.000198 | 0.000219 | 0.000230 | 0.000238 | 0.000244 | 0.000250 | 0.000256 | 0.000264 | 0.000278 |
PR #3836 vcc "hardcoded" tracing code https://github.com/nigoroll/varnish-cache/8e6f877f0053e42a8bc8eb0129d7aae21db2aed3
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
|---|---|---|---|---|---|---|---|---|
| 0.000189 | 0.000209 | 0.000222 | 0.000230 | 0.000236 | 0.000242 | 0.000248 | 0.000256 | 0.000271 |
description from commit message
To potentially emit VCL_trace VSL records, we called VPI_count() from VGC for, in the extreme, every line of VCL.
For a normal setup, VPI_count() would call into VSL only to find out that VCL_trace is masked and not have any effect.
Issuing two additional function calls for each line of VCL obviously is something we should avoid unless there is a real benefit.
To avoid the overhead, we move the check for the tracing flag to VGC and decorate it with UNLIKELY(), which has no effect for now but is expected to become a branch prediction hint soon.
This should bring the cost down close to zero.
Being at it, we also add VCL control over tracing through req.trace and bereq.trace. req.trace gets initialized from feature trace, and bereq.trace inherits the value from req.trace. For now, tracing of the housekeeping subs vcl_init and vcl_fini can only be controlled through feature trace.