hyper icon indicating copy to clipboard operation
hyper copied to clipboard

Feature: Consider splitting the protocol implementation into its own crate

Open notgull opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe.

Hyper is one of my favorite Rust crates; however, there is an issue that I have with it that I probably should have brought up earlier in the 1.0 release process. While it is a fantastic crate with a rock-solid HTTP implementation, the only issue is that, in order to access this HTTP implementation, you need to use the API that Hyper provides.

It would be nice if there were a separate crate, to the turn of hyper-protocol, that would allow access to the protocol bits. This would allow other users to use this crate to build alternate APIs on top of the hyper protocol implementation.

Describe the solution you'd like

There would be a hyper-protocol crate that would provide an API without any I/O involved. This means that there is no async, no blocking, no Read or Write; just functions/structures that write to or read from in-memory buffers. hyper itself would then build atop this API to provide its current API.

This approach would have a number of advantages:

  • The resulting crates would be more simple to test. Since you're testing the protocol and not the I/O bits, the branch space will shrink significantly, allowing test coverage to cover a larger number of areas.
  • Without the I/O bits, it's easier to reason about the protocol code and ensure that it is correct.
  • The HTTP protocol implementation can be reused by other crates. For instance, crates like ureq and hring provide their own HTTP implementations. To reduce the amount of work done by the open-source community as a whole, it would be nice to have them all use only one HTTP implementation.

See this website for more information.

Describe alternatives you've considered

While there's nothing wrong with the status quo, it's just that this means that other crates have to provide their own HTTP implementations. Reducing the number of other implementations would reduce the amount of work the Rust community needs to do.

Additional context

See this website for more information on the sans-I/O idea as a whole. It applies mostly to Python but I think that it also makes sense for Rust.

For a success story, see the x11rb-protocol crate, which I recently helped split out from x11rb. This allows for x11rb to use sync code and my own crate, breadx to use async code which reducing work by being based on the same X11 protocol implementation.

I am definitely willing to help out with this issue.

notgull avatar Nov 17 '22 03:11 notgull

Something like this does sound good, and I have in the past sketched up a markdown document about pulling out the HTTP/1 protocol into a separate crate, like h2 and h3. (Maybe I can find it...)

I don't think this would require anything in hyper's public API to change, so it wouldn't block hyper's 1.0 to figure this out. Thus, I'm most focused on that. But trying to come up with a design doc around it would be fine!

seanmonstar avatar Nov 17 '22 20:11 seanmonstar

Thus, I'm most focused on that. But trying to come up with a design doc around it would be fine!

Great! I'll sketch out a full Markdown RFC later this week.

notgull avatar Nov 17 '22 21:11 notgull

RFC: Separate the I/O processing and the parsing/handling of the HTTP protocol

Summary

hyper should adopt the sans-I/O model, and separate its protocol code out from the code responsible for I/O. It should create another crate, hyper-protocol, which contains the code for parsing and handling the HTTP protocol. hyper would then depend on hyper-protocol, and then use it to handle the HTTP protocol.

Motivation

hyper is, far and away, the most popular HTTP implementation for Rust. It uses hand-written HTTP code that is as solid as a rock. Like the README says, it's correct. However, what about other HTTP implementations? ureq and attohttpc write their own HTTP code. In addition, the in-development hring package is in the process of writing its own implementation as well. By my count, that's four separate implementations of the HTTP protocol.

HTTP is a notoriously complex protocol; a real-world implementation will have to look past the RFCs in order to deal with corner cases that show up in the real world. Most of the time, these smaller projects do not have the resources to deal with these corner cases. As a result, their HTTP implementations may fail on real-world input or be incomplete. In addition, the fact that there are multiple implementations of the HTTP protocol means that the Rust open source community has to duplicate a significant amount of work to maintain all of this code. As the open-source community is built on the backs of volunteers with little free time, this is a significant burden. A solution to this problem would be to have a separate protocol crate, independent of any I/O decisions. That way, all of the HTTP protocol code can be maintained in one place, and the end-consumer crates (including hyper) can use it to handle the HTTP protocol.

There are many other reasons to separate the I/O from the protocol; for instance, it makes testing the protocol code much easier. See this website for more rationale and information.

Guide-Level Explanation

There will be a new hyper-protocol crate. The public API of this crate will look like this:

use http::{Request, Response};

pub struct Client {
    // ...
}

pub type ClientReadResult = Result<Option<Response>, Error>;

impl Client {
    pub fn builder() -> ClientBuilder { /* ... */ }
    pub fn needs_read_data(&self) -> usize { /* ... */ }
    pub fn read_data(&mut self, data: &[u8]) -> ClientReadResult { /* ... */ }
    pub fn send_request(&mut self, request: Request) -> Result<(), Error> { /* ... */ }
    pub fn needs_write_data(&self) -> usize { /* ... */ }
    pub fn write_data(&mut self, out: &mut [u8]) -> Result<usize, Error> { /* ... */ }
}

pub struct Server {
    /* ... */
}

pub type ServerReadResult = Result<Option<Request>, Error>;

impl Server {
    pub fn builder() -> ServerBuilder { /* ... */ }
    pub fn needs_read_data(&self) -> usize { /* ... */ }
    pub fn read_data(&mut self, data: &[u8]) -> ServerReadResult { /* ... */ }
    pub fn send_response(&mut self, response: Response) -> Result<(), Error> { /* ... */ }
    pub fn needs_write_data(&self) -> usize { /* ... */ }
    pub fn write_data(&mut self, out: &mut [u8]) -> Result<usize, Error> { /* ... */ }
}

The point is that the entire API offsets any kind of waiting onto the user, and that its functionality revolves around mostly reading to and writing from in-memory data buffers. To this end, running a HTTP client (synchronously) would look something like this:

use hyper_protocol::Client;
use std::net::TcpStream;

let mut client = Client::builder().http1();
let stream = TcpStream::connect(("google.com", 80))?;
let mut buffer = vec![0u8; 1024];

// Send a request.
client.send_request(/* ... */)?;

// Wait for activity.
loop {
    buffer.resize(client.needs_read_data(), 0);
    stream.read(&mut buffer)?;
    if let Some(response) = client.read_data(&buffer)? {
        // Handle the response...
    }

    buffer.resize(client.needs_write_data(), 0);
    let written_data = client.write_data(&mut buffer)?;
    if written_data > 0 {
        stream.write_all(&buffer[..written_data])?;
    }
}

I'm definitely down to bikeshed this API to make it easier to work with, as well as to more easily differentiate connection types so that HTTP 1.1, HTTP 2 and (in the future) HTTP 3 can be easily supported.

Reference-Level Explanation

This project would require certain crates to be split out into new crates.

  • Parts of the src/proto/h1 module in hyper would be split into a new crate named h1-proto or similar, which would handle HTTP 1.1 in a sans-I/O fashion.
  • Parts of the h2 crate would split into h2-proto or similar, which would handle HTTP 2 in a sans-I/O fashion.
  • Parts of the src/proto module and others in hyper would be split into a hyper-protocol crate which would use the API above.

Drawbacks

This would involve at least three more crates being added to the Rust ecosystem. While I'm not opposed to this, it's worth noting that this would be an increase in the number of crates that hyper depends on. In addition, it would likely make the current hyper crate more complex, as it would require taking some of the logic and turning it into a state machine.

Prior art

Recently, I helped split the x11rb crate into an x11rb-protocol crate. The split protocol is now used in my breadx crate in order to provide an alternative asynchronous API over the same protocol. This has worked out very well, and I think that it would be a good idea to do the same for hyper.

Another instance of a protocol/IO split is postgres-protocol.

notgull avatar Nov 21 '22 23:11 notgull

Disclaimer: offering thoughts after only skimming your draft RFC.

A few things to consider:

  • Currently hyper goes to decent lengths to avoid copying data where possible (using bytes::{Bytes, Buf}). That same care should be put into the API of the separate h1 crate.

  • There are some complicated timing and triggering events in http1. For example, the Expect: 100-Continue header and the corresponding HTTP/1.1 100 Continue\r\n\r\n response. A client sends the header when it wants to delay sending the request body until it knows the server wants to read it (e.g., client doesn't want to waste bandwidth sending a huge body if the server is going to say 400 Bad Request because of the headers). So in hyper there's a complicated dance between the reader and writer side of a connection. The first time the server attempts to read from the body, that will signal the connection to send the 100 Continue response. This complexity and others like it will need to somehow be expressed in the separate h1 crate.

BGR360 avatar Jan 12 '24 07:01 BGR360