Add a special-purpose synth storage engine
This PR includes #4358 as of 576e2a50f98bcd030372bebddce04628956a685e
Before this patch, creating a response body in vcl_synth {} involved two memcpy to heap operations: First to a vsb, then to a storage object.
The new "synth" storage engine simplifies this drastically in tandem with special casing in cnt_synth() and VRT_l_resp_body(): Constituents of the response are not copied, but rather referenced in a list of VSCARABs, which are the directly used for delivery.
Besides this body handling, the synth storage engine only supports the bare minimum object API calls.
To accomodate the "hand out VSCARAB" semantics instead of "here is a buffer to write to", ObjGetspace() is used in an incompatible, special way. We might want to consider adding a special purpose object API instead.
Also, there currently is no way for storage functions to get hold of the request workspace directly, so it is retrieved via the pthread key.
For buffers, simple malloc()/free() is used.
Performance numbers will be coming.
performance tests:
vcl 4.1;
backend none none;
sub vcl_synth {
set resp.body = "42";
return (deliver);
}
sub vcl_recv {
return (synth(200));
}
/tmp/sbin/varnishd -a 127.0.0.1:8080 -f $PWD/t.vcl -n /tmp/t
trunk 8cbf914a106e19bbec12f0fcd5c78348b2f52828
$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
100 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.05ms 7.35ms 326.92ms 99.65%
Req/Sec 2.70k 161.09 6.41k 91.51%
8070232 requests in 30.11s, 1.04GB read
Requests/sec: 268046.60
Transfer/sec: 35.44MB
this PR e46bfe88da30f798803b218182a74091295773d0
$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
100 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.49ms 13.07ms 472.68ms 99.42%
Req/Sec 2.77k 266.63 11.98k 92.38%
8264773 requests in 30.10s, 1.07GB read
Requests/sec: 274569.99
Transfer/sec: 36.31MB
this PR + #4364
$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
100 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.55ms 4.36ms 198.83ms 95.27%
Req/Sec 5.15k 1.58k 19.93k 70.00%
15407779 requests in 30.09s, 1.99GB read
Requests/sec: 511986.31
Transfer/sec: 67.70MB
this PR + #4073 https://github.com/nigoroll/varnish-cache/tree/stv_synth_partial_nocache d46355cec4c1ee09188130cf2b4c290bdb057b0b
$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080
Running 30s test @ http://127.0.0.1:8080
100 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.02ms 7.34ms 279.16ms 99.49%
Req/Sec 2.77k 194.14 8.65k 91.47%
8267057 requests in 30.10s, 1.07GB read
Requests/sec: 274647.94
Transfer/sec: 36.32MB
what's left
The remaining inefficiencies are related to the VSL mtx. Completely disabling VSL at compile time brings the number up to ~1.8Mreq/s