isahc icon indicating copy to clipboard operation
isahc copied to clipboard

Add support for CURLOPT_RESOLVE, CURLOPT_FORBID_REUSE, CURLOPT_FRESH_CONNECT

Open kathampy opened this issue 4 years ago • 16 comments

I'll try to implement these shortly. Just filing an issue for tracking.

CURLOPT_RESOLVE is nice to have in addition to setting DNS servers per request. I'd like to override some DNS resolutions per request. https://docs.rs/curl/0.5.0/curl/easy/struct.Easy2.html#method.resolve

CURLOPT_FORBID_REUSE or CURLOPT_FRESH_CONNECT is nice for HTTP benchmarking purposes. I hope one of them allows redirects to reuse the connection.

kathampy avatar Sep 24 '19 12:09 kathampy

These sound like great things to support! 👍

sagebind avatar Sep 24 '19 16:09 sagebind

Proposed API for controlling connection caching per client:

impl HttpClientBuilder {
    /// Set the size of the connection cache. Setting to `None` disables connection
    /// caching.
    fn connection_cache_size(mut self, size: impl Into<Option<usize>>) -> Self;
}

Setting to a nonzero value configures CURLMOPT_MAXCONNECTS, setting to None enables CURLOPT_FORBID_REUSE. In addition, on specific requests:

trait ResponseExt {
    /// Do not re-use a connection from the connection cache (if enabled) for this request.
    fn without_connection_cache(&mut self) -> &mut Self;
}

Calling this method on a request sets CURLOPT_FRESH_CONNECT for that specific request.

sagebind avatar Oct 11 '19 04:10 sagebind

There's value in supporting both CURLOPT_FRESH_CONNECT and CURLOPT_FORBID_REUSE per request. I use the same client for benchmarking purposes and other calls to the same servers. I don't want the benchmarking connections (CURLOPT_FORBID_REUSE) to be reused by the other calls, nor the other connections to be reused by the benchmarking calls (CURLOPT_FRESH_CONNECT). The benchmarking calls should not reuse their own connections either (CURLOPT_FRESH_CONNECT). So I would set both options on the benchmarking calls. The other calls should still benefit from connection reuse.

In the case of HTTP/2, I wonder if CURLOPT_FORBID_REUSE also prevents other calls from multiplexing requests into an active benchmarking connection. This is quite important.

trait ResponseExt {
    /// Close the connection on completion of this request and do not return it to the connection cache.
    fn remove_from_cache(&mut self) -> &mut Self;
}

kathampy avatar Oct 11 '19 09:10 kathampy

CURLOPT_RESOLVE API would look just like the dns_servers API I assume.

kathampy avatar Oct 11 '19 09:10 kathampy

Maybe we could set both CURLOPT_FRESH_CONNECT and CURLOPT_FORBID_REUSE in one call for the per-request API? Effectively the meaning is that that one request does not partake in the connection cache at all, if there is one for the client.

sagebind avatar Oct 11 '19 15:10 sagebind

That would address my current requirement. Someone else may want them separate. The behaviour for redirects within a request is unknown.

kathampy avatar Oct 11 '19 15:10 kathampy

In the case of HTTP/2, I wonder if CURLOPT_FORBID_REUSE also prevents other calls from multiplexing requests into an active benchmarking connection.

Looking at the source, it appears the answer is no. The only way to avoid multiplexing on an active connection that supports it is to explicitly request a fresh one (CURLOPT_FRESH_CONNECT) or to disable it client-wide (CURLMOPT_PIPELINING). The only thing CURLOPT_FORBID_REUSE does is ask curl to close the connection after we are done with it instead of returning it to the connection cache (and even then, in certain circumstances, this option is not respected, like if a connection is currently undergoing an auth handshake). While the request is active, the connection is available for other requests to use.

I don't want the benchmarking connections (CURLOPT_FORBID_REUSE) to be reused by the other calls

I'm diving into this feature currently, and I am having difficulty imaginging the use case for this within a single client. Seeing as CURLOPT_FORBID_REUSE doesn't actually prevent other requests from re-using the connection while it is active (unless they have CURLOPT_FRESH_CONNECT enabled). Really the only use for CURLOPT_FORBID_REUSE is to avoid putting inactive connections in the cache.

For your specific use-case, I would create two separate clients configured differently. Regardless of how Isahc's API presents the options, it seems like the behavior you are after isn't supported by libcurl. For true connection isolation you will need separate clients.

sagebind avatar Oct 18 '19 16:10 sagebind

Does CURLOPT_FRESH_CONNECT also force new connections for redirects to other domains (A > B) that may have been previously cached in a different request (it should)? And will multiple redirects between domains (A > B > A) within the same request reuse connections (they should)?

kathampy avatar Oct 18 '19 16:10 kathampy

When inside a redirect it appears that CURLOPT_FRESH_CONNECT is completely ignored intentionally: https://github.com/curl/curl/blob/e062043433381fa5fd2f90b2fcc9dc912dbb79f6/lib/url.c#L3577-L3580. After every followed redirect, the existing easy handle is essentially reset to point to the new URL, except that this_is_a_follow is set to true (and other bookkeeping).

sagebind avatar Oct 18 '19 16:10 sagebind

So a redirect to domain B from A could reuse a connection to B from an unrelated request, even with CURLOPT_FRESH_CONNECT?

kathampy avatar Oct 18 '19 17:10 kathampy

That's the way the code reads, yeah, unless there's something I am misreading that a libcurl maintainer might know better.

sagebind avatar Oct 18 '19 17:10 sagebind

It looks easy handles can be isolated within the multi interface by setting a separate share handle on them. CURL_LOCK_DATA_CONNECT in a new share handle would effectively prevent connection reuse from the multi pool for redirects to different domains, but still allow reuse within the easy handle redirect chain. (CURLOPT_FRESH_CONNECT would not be required even for the initial connection.)

You could provide an isolate method for an isahc request that creates a new share handle with all the share options set.

kathampy avatar Oct 18 '19 17:10 kathampy

Note that CURL_LOCK_DATA_CONNECT does not do anything until curl 7.57.0 which is somewhat recent; if running against an older version of curl this approach will still not be effective.

sagebind avatar Oct 26 '19 00:10 sagebind

It's better to have a documented isolate() function that works correctly on > 7.57 rather than CURLOPT_FRESH_CONNECT with undefined internal behaviour.

kathampy avatar Oct 26 '19 08:10 kathampy

I'm experimenting with the share API here: #105

I understand your use case and I want to help, but I also do not want to add any configuration that isn't guaranteed to just work out of the box, because it reduces confidence in the API and makes it difficult for me to support them. dns_servers is an example of that which I consider a mistake (but removing it breaks BC).

My understanding is that the requirements are:

  • New requests should use a new connection.
  • DNS should always be resolved.
  • Resources should not be shared between multiple active requests.

Would something like this work for you?

lazy_static! {
    static ref BENCHMARK_CLIENT: HttpClient = HttpClient::builder()
        // Disable connection caching, available on master today.
        .connection_cache_size(0)
        // Don't cache DNS results, also on master today.
        .dns_cache(DnsCache::Disable)
        // Disable multiplexing multiple requests on the same connection, something we
        // could add easily.
        .disable_multiplexing()
        .build()
        .unwrap();
}

Using lazy_static gets you the same lifetimes as using the global client; indeed the global client is basically just HttpClient::new() in a lazy_static.

A client-wide ability to disable multiplexing would be something that would be easy to add and support that would work anywhere (via CURLMOPT_PIPELINING). On old versions of curl (older than 7.43.0) multiplexing wasn't supported and so it is already "disabled", and on newer versions we can disable it explicitly. So the above client configuration would get you the behavior you expect regardless of what version of curl is used.

sagebind avatar Oct 26 '19 14:10 sagebind

Multiplexing would need to be disabled at the client level, and I can use a new static client for that. A redirect from A > B > A must reuse the connection and DNS cache for A. If curl can decide to reuse a connection without peeking at DNS, then a completely disabled DNS cache shouldn't matter in this case.

A HTTP to HTTPS redirect would create a new connection, but must use the DNS cache - I don't think this would work if DNS caching is disabled in the client. HTTP to HTTPS redirects are very common in my benchmark URLs and would add unnecessary DNS resolution latency to a lot of requests.

Each request should make use of connection caching, DNS caching, etc. when there are redirects from the first request, but not share them with other requests. Another way to think of it is each request is like calling curl from the CLI - it should benefit from caching like a real user's browser / client software, but be completely isolated from other requests.

kathampy avatar Oct 26 '19 15:10 kathampy