varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

Add a special-purpose synth storage engine

Open nigoroll opened this issue 7 months ago • 1 comments

This PR includes #4358 as of 576e2a50f98bcd030372bebddce04628956a685e

Before this patch, creating a response body in vcl_synth {} involved two memcpy to heap operations: First to a vsb, then to a storage object.

The new "synth" storage engine simplifies this drastically in tandem with special casing in cnt_synth() and VRT_l_resp_body(): Constituents of the response are not copied, but rather referenced in a list of VSCARABs, which are the directly used for delivery.

Besides this body handling, the synth storage engine only supports the bare minimum object API calls.

To accomodate the "hand out VSCARAB" semantics instead of "here is a buffer to write to", ObjGetspace() is used in an incompatible, special way. We might want to consider adding a special purpose object API instead.

Also, there currently is no way for storage functions to get hold of the request workspace directly, so it is retrieved via the pthread key.

For buffers, simple malloc()/free() is used.

Performance numbers will be coming.

nigoroll avatar Jul 11 '25 13:07 nigoroll

performance tests:

vcl 4.1;

backend none none;

sub vcl_synth {
	set resp.body = "42";
	return (deliver);
}

sub vcl_recv {
	return (synth(200));
}
/tmp/sbin/varnishd  -a 127.0.0.1:8080 -f $PWD/t.vcl -n /tmp/t 

trunk 8cbf914a106e19bbec12f0fcd5c78348b2f52828

$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080 
Running 30s test @ http://127.0.0.1:8080
  100 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.05ms    7.35ms 326.92ms   99.65%
    Req/Sec     2.70k   161.09     6.41k    91.51%
  8070232 requests in 30.11s, 1.04GB read
Requests/sec: 268046.60
Transfer/sec:     35.44MB

this PR e46bfe88da30f798803b218182a74091295773d0

$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080 
Running 30s test @ http://127.0.0.1:8080
  100 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.49ms   13.07ms 472.68ms   99.42%
    Req/Sec     2.77k   266.63    11.98k    92.38%
  8264773 requests in 30.10s, 1.07GB read
Requests/sec: 274569.99
Transfer/sec:     36.31MB

this PR + #4364

$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080 
Running 30s test @ http://127.0.0.1:8080
  100 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.55ms    4.36ms 198.83ms   95.27%
    Req/Sec     5.15k     1.58k   19.93k    70.00%
  15407779 requests in 30.09s, 1.99GB read
Requests/sec: 511986.31
Transfer/sec:     67.70MB

this PR + #4073 https://github.com/nigoroll/varnish-cache/tree/stv_synth_partial_nocache d46355cec4c1ee09188130cf2b4c290bdb057b0b

$ ulimit -n $((1<<20)) ; wrk -c 1000 -d 30 -t 100 http://127.0.0.1:8080 
Running 30s test @ http://127.0.0.1:8080
  100 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.02ms    7.34ms 279.16ms   99.49%
    Req/Sec     2.77k   194.14     8.65k    91.47%
  8267057 requests in 30.10s, 1.07GB read
Requests/sec: 274647.94
Transfer/sec:     36.32MB

what's left

The remaining inefficiencies are related to the VSL mtx. Completely disabling VSL at compile time brings the number up to ~1.8Mreq/s

nigoroll avatar Jul 11 '25 14:07 nigoroll