varnish-cache Add "via" backends (with endpoint preamble)

FTR: This PR is a continuation of prior work in #2850 #2957 #3127, now rebased onto the new endpoint-preamble code as of 3f077667d47338dcf1d593d6765b01151968072c

This suggestion can be seen as a convenient and simple way of configuring a specific use case for a preamble: With a PROXY preamble, we can instruct a peer to make a connection on our behalf:

via is to say "I have this real backend, connect to it via this other backend".

To illustrate the main use case, here's an updated version of the text from #2850

In varnish-cache, the deliberate decision has been made to not support TLS from the same address space as varnish itself, see https://github.com/varnishcache/varnish-cache/blob/master/doc/sphinx/phk/ssl_again.rst

So the obvious way to connect to TLS backends is to use a TLS"onloader" (a term coined by @slimhazard as in the opposite of "offloader"), which turns a clear connection into a TLS connection to some other fixed config address.

This requires additional configuration in two places: An address/port or UDS path needs to be uniquely allocated for the onloader, the specific onloader configuration has to be put in place and a varnish backend needs to be added to the onloader. All of this for each individual backend. Also, this requirement prevents any use of dynamic backends with a TLS onloader.

haproxy, however, offers a convenient and elegant way to avoid this configuration overhead: The PROXY protocol can also be used to transport the destination address which haproxy is to connect to if a server's address is unspecified (IN_ADDR_ANY / 0.0.0.0). The configuration template for this use case looks like this (huge thank you to @wtarreau for pointing out this great option in haproxy):

    listen sslon
            mode    tcp
            maxconn 1000
            bind    /my/path/to/ssl_onloader accept-proxy mode 777
            stick-table type ip size 100
            stick   on dst
            server  s00 0.0.0.0:443 ssl ca-file /etc/ssl/certs/ca-bundle.crt alpn http/1.1 sni fc_pp_authority
            server  s01 0.0.0.0:443 ssl ca-file /etc/ssl/certs/ca-bundle.crt alpn http/1.1 sni fc_pp_authority
            server  s02 0.0.0.0:443 ssl ca-file /etc/ssl/certs/ca-bundle.crt alpn http/1.1 sni fc_pp_authority
            server  s03 0.0.0.0:443 ssl ca-file /etc/ssl/certs/ca-bundle.crt alpn http/1.1 sni fc_pp_authority
            # ...

With this setup, by connecting to /my/path/to/ssl_onloader and sending the address to make a TLS connection to in a PROXY header (as the server address / port), we can reduce the configuration overhead outside varnish substantially. In particular, we do not require a path / port per destination and get back the option to use dynamic backends, but can rather use simple backend configs:

backend sslon {
    .path = "/my/path/to/ssl_onloader";
}
backend a {
    .host = "1.2.3.4";
    .via = sslon;
}
backend b {
    .host = "4.5.6.7";
    .via = sslon;
}

# use backends a and b as normal...

As seen in the example, backends can be created with an additional "via" director, which has to resolve to a simple backend. This configures a connection to the "via" backend with the actual endpoint address in an additional PROXY preamble.

Nov 13 '19 14:11 nigoroll

Uhm, this is quite a grab-bag of different stuff...

Do you seriously propose that all this should be reviewed/approved as a single unit ?!

Dec 02 '19 09:12 bsdphk

FTR: clarified with @bsdphk via IRC: I should have written a better description: He proposed that the preamble should be in the backend, but putting it in the tcp pool has some merits to it. I tried both and would like to ask for opinions. The relevant commit is https://github.com/varnishcache/varnish-cache/commit/4f58c95cd1dc7b3d130e7594b75ccdfc6ae8d70d, the other commits are basically analogous, adjusted accordingly. I think the main point about the preamble in the tcp pool would be somehow less duplicated code in backend vs. probe code and one less race between the backend and probe code. Also, by design, I think it actually makes sense to consider the preamble a tcp connection function, it would be the same for h2 backends, for example

Dec 02 '19 09:12 nigoroll

Also, by design, I think it actually makes sense to consider the preamble a tcp connection function

I agree on this point, maybe we could move in this direction first and review that change in isolation?

Dec 02 '19 11:12 dridi

it would be the same for h2 backends, for example

Or even HTTPS (a requirement for h2) backends could benefit from having the TLS handshake as a kind of preamble.

Dec 02 '19 11:12 dridi

maybe we could move in this direction first and review that change in isolation?

@Dridi OK, but can we please tick off the rather trivial #2680 and #2845 first? It really does not make sense from my perspective to first have to wait for ages for reviews and then get criticized for additional work on top of the unmerged changes.

Or even HTTPS (a requirement for h2) backends could benefit from having the TLS handshake as a kind of preamble.

that is the usecase of this PR (indirectly, via the PROXY header)

Dec 02 '19 11:12 nigoroll

I have updated the PR based on the new preamble-in-endpoint code by @bsdphk I will rewrite the description to reflect the current state of affairs.

Feb 10 '21 10:02 nigoroll

As I see it, the benefits of having a .via facility is that we can share:

A) Health-probe of the intermediate

B) Normal director services (fail-over, load-balancing) of the intermediate.

What we cannot share is the connections, because the prefix makes it anyone's guess where they go.

@Dridi has argued that we need a new VCL_ENDPOINT type instead, and while that is cleaner from a data-type point of view, I cannot see how we can gain the director services that way.

We may be able to salvage the idea of .via this way:

Add optional VDI_get_fd() and VDI_return_fd methods, which operate on the fd label.

It is not at all obvious to me that the director we call this on should have access to the transaction details, but it could obviously need some kind of "cookie" for load-balancing purposes.

Then we teach vbe_dir_getfd() (and friends) to uses that API when a backend is defined using .via rather than .host.

Any backend which has any kind of .prefix attribute, must pool the recyclable connections itself instead of returning them to the .via which intially supplied it.

Then, if you want to have two backends which pool their connections and one which does not you could do:

backend_intermediary { ... }
backend_pool_xyz {
    .via = backend_intermediary;
    .prefix = something_something(xyz);
}
backend_xyz_one {
    .via = backend_pool_xyz;
    ...
}
backend_xyz_two {
     .via = backend_pool_xyz;
     ...
}
backend_xyz_unpooled {
     .via = backend_intermerdiary;
     .prefix = something_something(xyz);
}

All three backends would still share whatever services backend_intermediary offer.

One obvious problem is that backend_intermediary has no idea how many connections are currently open, because both backend_pool_xyz and backend_xyz_unpooled have their own VCP, that could be fixed with a third "I have closed the fd you gave me earlier" VDI_method.

Aug 18 '21 08:08 bsdphk

FTR, after some discussion about the last comments:

@Dridi is right about test coverage which could (always) be improved
As he pointed out himself, I think using a BACKEND for via is simple and allows interesting future improvements
@bsdphk has a point that .max_connections should also be checked for the via backend
I specifically would want to avoid .prefix = something_something(xyz);. I think it is important that a via connection uses the same endpoint information as the actual backend itself.
Other than that I am not sure what to take away from his comment

Aug 23 '21 14:08 nigoroll

notes from irc: @Dridi has warmed this topic. I would appreciate if @bsdphk could have another look how we could make iterative progress with this. As mentioned before, sensible suggestions for further improvements are welcome and should be implemented if needed, but I would hope that we could start off with anything at all.

Also we agree that another future extension would be the choice of protocol which the via backend supports, e.g.

backend proxy { .via_protocol = CONNECT | SOCKS5 | PROXY2 }
backend destination { .via = proxy }

Dec 06 '21 14:12 nigoroll

We would like to see explicit coverage of via backends connection pooling to be certain of the pooling behavior when multiple backends share the same via backend.

Test case added: 2cd481fc661b446b8b18b6e49859c338ab5a3cdd

One concern is also the use of the BACKEND type for the via property, and to some extent the arbitrary limitation against unix domain sockets.

There is no such limitation. c00042.vtc (see above) actually uses a TCP via backend.

Dec 06 '21 16:12 nigoroll

@bsdphk docs added in 5ee551f2a9bed94a4c037c9667ae58fa702417a8

May 24 '22 11:05 nigoroll

@daghf tested it and told me it compiles and works generally fine. I would rather artificially prevent this behavior first and consider allowing it when we have a strong case for it. In general I don't think we should expect explicit interaction with more than one downstream proxy. I'm also wondering what via stacking does to connection pooling since (I assume based on my review long ago) there will be only one preamble for a chain of two proxies.

Aug 30 '22 12:08 dridi

I would rather artificially prevent this behavior first and consider allowing it when we have a strong case for it.

@Dridi I originally intended this to work, but have disallowed it because of your veto in c73e1f9d13f6c33f994f4708e3d7d6a28704c9b4

Aug 30 '22 13:08 nigoroll

@Dridi I originally intended this to work, but have disallowed it because of your veto in c73e1f9

What was the expected behavior?

With two proxies, what does the preamble look like?

If I add backend d { .host = "same-as-c"; .via = a; } to my previous snippet, do we effectively get separate connection pools for backends c and d?

I feel this is both under-documented and something easier to add later than the other way around. The latter would be a breaking change.

Aug 30 '22 14:08 dridi

@Dridi apologies for my bad wording: I meant to say that I originally intended this to work (this PR is last in a series of iterations) and the current implementation does, as you rightly point out, not support the scenario properly, because we do not recursively resolve the endpoint and do not stack the preambles either. It is thus correct to forbid the scenario, test and document it, as you requested.

How stacking via backends could work in future as been discussed during the last VDD and I would want to avoid reopening the discussion here. There are also some ideas in the 7d698ee1b3f592017415005144ba7fc2b2f9fe34 commit message.

Aug 31 '22 09:08 nigoroll

FYI: As of this comment, the force-push is just a rebase onto master. After this comment, I am going to push the changes addressing dridis feedback as individual commits.

Feb 20 '23 15:02 nigoroll

the rebase after this comment is squashing the dridi nitpicks

Feb 20 '23 15:02 nigoroll