http icon indicating copy to clipboard operation
http copied to clipboard

Add lifetime to HttpValues for non-copying response headers

Open jespersm opened this issue 3 years ago • 9 comments

This is a draft PR for adding lifetimes HeaderValues, HeaderMap, and Response, so that header values can borrow from something with some longer lifetime.

The patch is implemented by suggestion from @SergioBenitez: This would reduce allocation in a number of cases and would match Rocket's current API for https://github.com/SergioBenitez/Rocket/issues/1498

There appears to be a slight time overhead due to the enum storing either a &[u8] reference instead of just the Bytes struct, but no noticeable impact otherwise. I've included a benchmark for the borrow case, and it appears to give the expected benefits.

As submitted, the patch breaks 9 doctests, and needs a bit of doc touches. I'll fix those if there is interest in adding this borrowing variation.

jespersm avatar Feb 27 '21 23:02 jespersm

Thanks for the PR! I'd be happy to discuss the feature/design before dealing with the complaining doc tests.

I'd say my biggest question is around motivation. I understand that the desire is to have a version that works with borrow slices, but I'd like to ask a few more "whys" to better understand. What are some genuine use cases where this variant is needed over what the cheap cloning of Bytes provides?

As you've noticed, this adds some type complexity (lifetime proliferation). This is probably the biggest impact, as it would be a breaking change and possibly have a large impact adding lifetimes through applications and library layers. I also noticed that this add a lifetime to the Response, but not to the Request. Assuming we got over the motivation part, would both message types want the same ability?

There appears to be a slight time overhead due to the enum storing either a &[u8] reference instead of just the Bytes struct, but no noticeable impact otherwise.

What overhead are you seeing, and how are you measuring? It looks like this would add another branch to all accesses of the header value, and likely would make the type bigger by an extra word.

seanmonstar avatar Mar 02 '21 01:03 seanmonstar

I'd say my biggest question is around motivation. I understand that the desire is to have a version that works with borrow slices, but I'd like to ask a few more "whys" to better understand. What are some genuine use cases where this variant is needed over what the cheap cloning of Bytes provides?

In isolation, with a narrow view focused on http and hyper, the motivation is nil. However, this is due to a shortcoming in http and hyper, namely, that the APIs do not allow borrowing from anywhere. It is impossible, in hyper or http, to reuse slices of the request or to store parsed data on the stack and then reference it later in multiple places.

By contrast, Rocket makes this trivial:

#[get("/<foo..>")]
fn redirect<'r>(foo: Origin<'r>) -> Redirect<'r> {
    Redirect::to(foo)
}

The hope would be that this handler can generate a response without allocation. Rocket parses the URI into an Origin without allocating, so the request part is allocation-free. Redirect needs to set a Location header. Ideally, the header's value could be set to the borrowed string slice in foo to avoid allocating. Unfortunately, http::HeaderMap forces us to allocate, unnecessarily. These kinds of applications are important in performance-sensitive environments.

In general, we want to be able to emit header names and values that borrow from the request or derived request data when we can; why should we be forced to allocate needlessly?

As you've noticed, this adds some type complexity (lifetime proliferation). This is probably the biggest impact, as it would be a breaking change and possibly have a large impact adding lifetimes through applications and library layers.

While the first part is true, the second isn't: applications and libraries that don't care about potentially unnecessary allocations can use an http::HeaderMap<'static>. You could even export an alias type OwnedHeaderMap = HeaderMap<'static>; so that the transition is as simple as s/HeaderMap/OwnedHeaderMap. Downstream dependencies should see little to no impact.

SergioBenitez avatar Mar 02 '21 02:03 SergioBenitez

As @SergioBenitez adressed the motive for the PR, I'll address the performance part:

What overhead are you seeing, and how are you measuring? It looks like this would add another branch to all accesses of the header value, and likely would make the type bigger by an extra word.

Yes, there is an extra branch, which I measured using the benches provided in the http crate itself and saw a performance degradation in the last digit nanosecond range on my 2016 hardware. I also saw some noise on unrelated benches, so it would make sense to perform new tests.

I haven't examined the size of each HeaderValue, but I'm guessing that the enum tag would be packed close to the is_sensitive boolean so that the struct would not hit a new alignment boundary on most platforms.

@seanmonstar: I'll gladly make better performance evaluations, if you are considering the PR. I also fully understand if you think the API change is not worth it, and I'll close it. :-)

jespersm avatar Mar 13 '21 12:03 jespersm

Would it be possible to avoid the breaking change by changing it to type HeaderMap = BorrowableHeaderMap<'static>; and then rocket can use BorrowableHeaderMap?

dr3s avatar Mar 24 '21 02:03 dr3s

ping @seanmonstar

SergioBenitez avatar Mar 25 '21 04:03 SergioBenitez

@seanmonstar Ping!

If you are unable or unwilling to review or weigh in on this, can you hand this to someone else to do so?

SergioBenitez avatar Apr 15 '21 19:04 SergioBenitez

I'll add my new thoughts soon, but let me ping @hyperium/http.

seanmonstar avatar Apr 16 '21 00:04 seanmonstar

This is due to a shortcoming in http and hyper, namely, that the APIs do not allow borrowing from anywhere. It is impossible, in hyper or http, to reuse slices of the request or to store parsed data on the stack and then reference it later in multiple places.

The http types are a compromise, sure. They aim for correctness, speed, and ease-of-use. While it is true that they can't take a reference to something on the stack, it is possible to reuse slices from the request. This is because http uses Bytes internally, which provides cheap copies and subslices to be made.

For example, you show what looks like code that parsed some Origin type from the request. Since that value likely either came from the Uri's PathAndQuery, it's almost just as cheap to have just taken a Bytes::slice, and then used that to make a new HeaderValue for the redirect, without having to allocate and copy the actual bytes. Is there a reason Rocket couldn't take advantage of that?

In my experience, Cow and similar types are useful inside a function, such as when parsing and you reach for something like String::from_utf8_lossy, but it becomes much more complicated when the code wants to return that type, especially if you have cows borrowing from different sources. I usually end up seeing that code just turn into Cow<'static, Thing>. Which then, in my opinion, the type has lost much of its purpose, and is just worse Bytes.

But that's why I originally mentioned wanting to "ask a few more why's". Even beyond just inside hyper, I haven't been exposed to many cases where this change would have been worth the downsides. If there are significant use cases, I'd like to hear. For instance, as I mentioned, I think the example you posted about Rocket and Origin could be solved with Bytes. But maybe a for more "why"s would reveal why it cannot.

seanmonstar avatar Apr 22 '21 23:04 seanmonstar

For example, you show what looks like code that parsed some Origin type from the request. Since that value likely either came from the Uri's PathAndQuery, it's almost just as cheap to have just taken a Bytes::slice, and then used that to make a new HeaderValue for the redirect, without having to allocate and copy the actual bytes. Is there a reason Rocket couldn't take advantage of that?

This would require Rocket to use Bytes internally as well as to wrap all "borrowed" return types in new types which hold a Bytes. The same would need to be true for any of a user's structures which want to hold a "borrow". This would also make it impossible to borrow from sources that didn't start as Bytes without copying.

In other words, this would have a viral effect on both Rocket's internal and external APIs and user's applications in a manner that I, and I imagine many of Rocket's users, would find unintuitive.

In my experience, Cow and similar types are useful inside a function, such as when parsing and you reach for something like String::from_utf8_lossy, but it becomes much more complicated when the code wants to return that type, especially if you have cows borrowing from different sources. I usually end up seeing that code just turn into Cow<'static, Thing>. Which then, in my opinion, the type has lost much of its purpose, and is just worse Bytes.

I'm not sure I follow what this part of your comment was directed at. I believe you're suggesting that Bytes is a better fit than Cow<'static, Thing> for most use-cases. If so, I don't understand: a Cow<'static, T> is only like a Bytes when T is a [u8]. In Rocket, we validate bytes as strings and other structures; there's no way to preserve that validation and maintain "borrowing" without wrappers: we'd need a wrapper for strings, paths, Rocket's custom string types (RawStr, UncasedStr), and so on.

More importantly, a Cow<'static, T> is only like Bytes when 'static. This is the crux: to remove that 'static bound.

But that's why I originally mentioned wanting to "ask a few more why's". Even beyond just inside hyper, I haven't been exposed to many cases where this change would have been worth the downsides. If there are significant use cases, I'd like to hear. For instance, as I mentioned, I think the example you posted about Rocket and Origin could be solved with Bytes. But maybe a for more "why"s would reveal why it cannot.

I hope the above makes it clear why this would be incredibly undesirable. Hyper's API forces the end-application not to borrow, at all. A Bytes is not a borrow nor a substitute for borrowing. At best, it escapes the type system's notion of borrowing which introduces the issues I've laid out above.

As a "lower-level" library, hyper has to be flexible in allowing "higher-level" libraries like Rocket to be performant and expose idiomatic APIs. Rocket already works around Hyper in a lot cases to make this possible (for instance, Hyper also can't hold a response that borrows from anywhere, so we use a channel to send the response). Copying headers is a case where we haven't found a workaround or where a workaround would be too expensive.

Which downsides do you see to making this change? The API surface doesn't change, a type alias makes the vast majority of application continue to compile, and libraries like Rocket and its applications can borrow from anywhere idiomatically.

SergioBenitez avatar Jun 23 '21 00:06 SergioBenitez