hyper
hyper copied to clipboard
Feature: Consider splitting the protocol implementation into its own crate
Is your feature request related to a problem? Please describe.
Hyper is one of my favorite Rust crates; however, there is an issue that I have with it that I probably should have brought up earlier in the 1.0 release process. While it is a fantastic crate with a rock-solid HTTP implementation, the only issue is that, in order to access this HTTP implementation, you need to use the API that Hyper provides.
It would be nice if there were a separate crate, to the turn of hyper-protocol
, that would allow access to the protocol bits. This would allow other users to use this crate to build alternate APIs on top of the hyper
protocol implementation.
Describe the solution you'd like
There would be a hyper-protocol
crate that would provide an API without any I/O involved. This means that there is no async
, no blocking, no Read
or Write
; just functions/structures that write to or read from in-memory buffers. hyper
itself would then build atop this API to provide its current API.
This approach would have a number of advantages:
- The resulting crates would be more simple to test. Since you're testing the protocol and not the I/O bits, the branch space will shrink significantly, allowing test coverage to cover a larger number of areas.
- Without the I/O bits, it's easier to reason about the protocol code and ensure that it is correct.
- The HTTP protocol implementation can be reused by other crates. For instance, crates like
ureq
andhring
provide their own HTTP implementations. To reduce the amount of work done by the open-source community as a whole, it would be nice to have them all use only one HTTP implementation.
See this website for more information.
Describe alternatives you've considered
While there's nothing wrong with the status quo, it's just that this means that other crates have to provide their own HTTP implementations. Reducing the number of other implementations would reduce the amount of work the Rust community needs to do.
Additional context
See this website for more information on the sans-I/O idea as a whole. It applies mostly to Python but I think that it also makes sense for Rust.
For a success story, see the x11rb-protocol
crate, which I recently helped split out from x11rb
. This allows for x11rb
to use sync code and my own crate, breadx
to use async code which reducing work by being based on the same X11 protocol implementation.
I am definitely willing to help out with this issue.
Something like this does sound good, and I have in the past sketched up a markdown document about pulling out the HTTP/1 protocol into a separate crate, like h2 and h3. (Maybe I can find it...)
I don't think this would require anything in hyper's public API to change, so it wouldn't block hyper's 1.0 to figure this out. Thus, I'm most focused on that. But trying to come up with a design doc around it would be fine!
Thus, I'm most focused on that. But trying to come up with a design doc around it would be fine!
Great! I'll sketch out a full Markdown RFC later this week.
RFC: Separate the I/O processing and the parsing/handling of the HTTP protocol
Summary
hyper
should adopt the sans-I/O model, and separate its protocol code out from the code responsible for I/O. It should create another crate, hyper-protocol
, which contains the code for parsing and handling the HTTP protocol. hyper
would then depend on hyper-protocol
, and then use it to handle the HTTP protocol.
Motivation
hyper
is, far and away, the most popular HTTP implementation for Rust. It uses hand-written HTTP code that is as solid as a rock. Like the README says, it's correct. However, what about other HTTP implementations? ureq
and attohttpc
write their own HTTP code. In addition, the in-development hring
package is in the process of writing its own implementation as well. By my count, that's four separate implementations of the HTTP protocol.
HTTP is a notoriously complex protocol; a real-world implementation will have to look past the RFCs in order to deal with corner cases that show up in the real world. Most of the time, these smaller projects do not have the resources to deal with these corner cases. As a result, their HTTP implementations may fail on real-world input or be incomplete. In addition, the fact that there are multiple implementations of the HTTP protocol means that the Rust open source community has to duplicate a significant amount of work to maintain all of this code. As the open-source community is built on the backs of volunteers with little free time, this is a significant burden. A solution to this problem would be to have a separate protocol crate, independent of any I/O decisions. That way, all of the HTTP protocol code can be maintained in one place, and the end-consumer crates (including hyper
) can use it to handle the HTTP protocol.
There are many other reasons to separate the I/O from the protocol; for instance, it makes testing the protocol code much easier. See this website for more rationale and information.
Guide-Level Explanation
There will be a new hyper-protocol
crate. The public API of this crate will look like this:
use http::{Request, Response};
pub struct Client {
// ...
}
pub type ClientReadResult = Result<Option<Response>, Error>;
impl Client {
pub fn builder() -> ClientBuilder { /* ... */ }
pub fn needs_read_data(&self) -> usize { /* ... */ }
pub fn read_data(&mut self, data: &[u8]) -> ClientReadResult { /* ... */ }
pub fn send_request(&mut self, request: Request) -> Result<(), Error> { /* ... */ }
pub fn needs_write_data(&self) -> usize { /* ... */ }
pub fn write_data(&mut self, out: &mut [u8]) -> Result<usize, Error> { /* ... */ }
}
pub struct Server {
/* ... */
}
pub type ServerReadResult = Result<Option<Request>, Error>;
impl Server {
pub fn builder() -> ServerBuilder { /* ... */ }
pub fn needs_read_data(&self) -> usize { /* ... */ }
pub fn read_data(&mut self, data: &[u8]) -> ServerReadResult { /* ... */ }
pub fn send_response(&mut self, response: Response) -> Result<(), Error> { /* ... */ }
pub fn needs_write_data(&self) -> usize { /* ... */ }
pub fn write_data(&mut self, out: &mut [u8]) -> Result<usize, Error> { /* ... */ }
}
The point is that the entire API offsets any kind of waiting onto the user, and that its functionality revolves around mostly reading to and writing from in-memory data buffers. To this end, running a HTTP client (synchronously) would look something like this:
use hyper_protocol::Client;
use std::net::TcpStream;
let mut client = Client::builder().http1();
let stream = TcpStream::connect(("google.com", 80))?;
let mut buffer = vec![0u8; 1024];
// Send a request.
client.send_request(/* ... */)?;
// Wait for activity.
loop {
buffer.resize(client.needs_read_data(), 0);
stream.read(&mut buffer)?;
if let Some(response) = client.read_data(&buffer)? {
// Handle the response...
}
buffer.resize(client.needs_write_data(), 0);
let written_data = client.write_data(&mut buffer)?;
if written_data > 0 {
stream.write_all(&buffer[..written_data])?;
}
}
I'm definitely down to bikeshed this API to make it easier to work with, as well as to more easily differentiate connection types so that HTTP 1.1, HTTP 2 and (in the future) HTTP 3 can be easily supported.
Reference-Level Explanation
This project would require certain crates to be split out into new crates.
- Parts of the
src/proto/h1
module inhyper
would be split into a new crate namedh1-proto
or similar, which would handle HTTP 1.1 in a sans-I/O fashion. - Parts of the
h2
crate would split intoh2-proto
or similar, which would handle HTTP 2 in a sans-I/O fashion. - Parts of the
src/proto
module and others inhyper
would be split into ahyper-protocol
crate which would use the API above.
Drawbacks
This would involve at least three more crates being added to the Rust ecosystem. While I'm not opposed to this, it's worth noting that this would be an increase in the number of crates that hyper
depends on. In addition, it would likely make the current hyper
crate more complex, as it would require taking some of the logic and turning it into a state machine.
Prior art
Recently, I helped split the x11rb
crate into an x11rb-protocol
crate. The split protocol is now used in my breadx
crate in order to provide an alternative asynchronous API over the same protocol. This has worked out very well, and I think that it would be a good idea to do the same for hyper
.
Another instance of a protocol/IO split is postgres-protocol
.
Disclaimer: offering thoughts after only skimming your draft RFC.
A few things to consider:
-
Currently hyper goes to decent lengths to avoid copying data where possible (using
bytes::{Bytes, Buf}
). That same care should be put into the API of the separate h1 crate. -
There are some complicated timing and triggering events in http1. For example, the
Expect: 100-Continue
header and the correspondingHTTP/1.1 100 Continue\r\n\r\n
response. A client sends the header when it wants to delay sending the request body until it knows the server wants to read it (e.g., client doesn't want to waste bandwidth sending a huge body if the server is going to say 400 Bad Request because of the headers). So in hyper there's a complicated dance between the reader and writer side of a connection. The first time the server attempts to read from the body, that will signal the connection to send the100 Continue
response. This complexity and others like it will need to somehow be expressed in the separate h1 crate.