hyper-staticfile Use spawn_blocking directly to read from files

Code is rough, and not sure if this is a good idea. Just playing around with ideas in #24.

This bypasses tokio::fs and directly uses spawn_blocking inspired by Tokio internals. This allows us to skip Tokio's internal buffering, and read directly into our own BytesMut.

It looks like, at least on macOS, the OS will fill whatever size buffer we give it.

Some relative results on my own machine using the example server with: ab -n 1000 -c 10 http://127.0.0.1:3000/search-index.js (an 975K file), and tweaking our own BUF_SIZE:

before,   8 KB buffer |  244.64 req/s |  238639.96 KB/s
before,  16 KB buffer |  419.66 req/s |  409371.92 KB/s
after,    8 KB buffer |  274.51 req/s |  267781.17 KB/s
after,   16 KB buffer |  477.92 req/s |  466201.27 KB/s
after,   32 KB buffer |  774.12 req/s |  755135.07 KB/s
after,   64 KB buffer | 1143.46 req/s | 1115419.94 KB/s
after,  128 KB buffer | 1395.01 req/s | 1360802.62 KB/s
after,  256 KB buffer | 1437.07 req/s | 1401834.38 KB/s
after,  512 KB buffer | 1604.41 req/s | 1565070.82 KB/s
after, 1024 KB buffer | 1726.24 req/s | 1683912.61 KB/s

These numbers are really shifty, though. But what I do find interesting is that there's an apparent ~13% increase comparing the two runs with 8 KB buffers.

Besides skipping a copy, the other thing I thought might make a difference is the change in open_with_metadata. Through Tokio internals, we previously did separate spawn_blocking calls for the open and stat steps there, but I combined these in this PR. To try eliminate this, I split the new code into two manual spawn_blocking calls and got these numbers:

8 KB buffer |  269.98 req/s |  263361.36 KB/s

So that only accounts for maybe ~1.5% of that 13% gain.

cc @lnicola

Dec 18 '19 20:12 stephank

That code seems (IIRC) pretty similar to what tokio does, I didn't really expect the performance improvement. One workaround suggested in that issue is to use task::run_in_place instead of task::spawn_blocking (or tokio::spawn together with task::run_in_place). Perhaps it's worth a try?

open_with_metadata

Our testing method is a bit different. I used a large file and a single client at a time.

Dec 18 '19 20:12 lnicola

I guess we should have both benchmarks in code somewhere, but I'll stick to ab for now, because it's what I have. :)

Using block_in_place requires rt-threaded, which totally changes Tokio performance. Performing the same test, starting with current unmodified code in this branch:

New baseline for this test: 224.71 req/s - 219455.55 KB/s
With rt-threaded: 479.57 req/s - 468364.66 KB/s
With rt-threaded, and block_in_place in open_with_metadata: 501.74 req/s - 490019.86 KB/s
With rt-threaded, and block_in_place in open_with_metadata and FileBytesStream: 2546.89 req/s - 2487372.11 KB/s

That seems kind of ridiculous.

I'm not even sure if using block_in_place inside a poll function is safe. This is what I was trying, essentially removing the State enum:

fn poll_next(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<Self::Item>> {
    match block_in_place(|| {
        self.buf.reserve(BUF_SIZE);
        let slice = self.buf.bytes_mut();
        let slice = unsafe {
            std::slice::from_raw_parts_mut(slice.as_mut_ptr() as *mut u8, slice.len())
        };
        (&*self.file).read(slice)
    }) {
        Ok(0) => Poll::Ready(None),
        Ok(size) => {
            unsafe { self.buf.advance_mut(size) };
            let retval = self.buf.split().freeze();
            Poll::Ready(Some(Ok(retval)))
        }
        Err(e) => Poll::Ready(Some(Err(e))),
    }
}

Dec 22 '19 14:12 stephank

Well that's a nice improvement, right? :-). Yeah, pity it won't with with the single-threaded runtime. Maybe we can do nothing and wait for tokio to fix the fs performance.

Otherwise it's safe to use AFAICT.

Dec 22 '19 14:12 lnicola

hyper-staticfile hyper-staticfile copied to clipboard

Use spawn_blocking directly to read from files

hyper-staticfile
hyper-staticfile copied to clipboard