Customizing FS filters options
Hi, I'm trying to serve a file (a big one). I tried that code :
extern crate warp;
use warp::Filter;
fn main() {
let big = warp::get2()
.and(warp::path::end())
.and(warp::fs::file("./big"));
warp::serve(big).run(([127, 0, 0, 1], 3030));
}
But when I tried to download the file, it's incredibly slow !
$ wget "http://localhost:3030/" -O /dev/null
~67MB/s
While, the same using Go :
$ wget "http://localhost:8080/big" -O /dev/null
~440MB/s
Here is the SC :
package main
import (
"log"
"net/http"
)
func main() {
http.HandleFunc("/big", func(w http.ResponseWriter, r *http.Request) {
http.ServeFile(w, r, "big")
});
log.Fatal(http.ListenAndServe(":8080", nil))
}
Isn't that a performance issue ?
To give you some context. I'm trying to serve big file using Rust. The standard Go server is pretty good at this and currently I'm not able to implement a Rust web server which can compete the Go one, but i didn't expect the warp web framework to be so slow !
Did i miss something ?
My first assumption is that Go is able to take advantage of sendfile, reducing copies from the file into userland and then the copy sent on the socket.
That sound realistic (and interesting) !
But it doesn't explain such low performance.
Even without using sendfile. A simple Stream reading a file by chunk from the disk using tokio (or std::file) and hyper (using wrap_stream) will be 3 times better (on my laptop) than the current method of warp to send a file.
I can't confirm that Go use sendfile (but it's likely because a comment in the server.go of the package http says so).
I don't think it's possible to use sendfile currently with hyper. How should it be implemented ? (Should it be ?) A new method from Body ? Body::send_file that take a file descriptor ?
Ok so I checked how easy it will be to use sendfile as well as go does. (I checked with strace, go definitively use sendfile)
It's clearly hard.
Based on the fact that syscall are made by the standard lib. It has to be implemented first in TCPStream (and/or UDP too), then into Hyper, then into warp.
Not for tomorrow then. :disappointed:
Thank you for the hint ! I'm happy to understand why Go perform better than Rust here. ^^
Even so, the performance of warp to serve a file can be improved. I still doesn't understand why it's so slow.
You say it's slower than doing it through hyper? That's surprising! Care has been taken to try to provide an optimized file stream in warp. What does the code look like for hyper that serves the file faster?
Sorry I should have been more reactive ! Here is the first code I tried using tokio.
extern crate tokio;
extern crate hyper;
use futures::{Poll, Async, Stream, Future};
use std::io::{Error, ErrorKind};
use hyper::{Body, Response, Server, service::service_fn_ok};
use tokio::io::AsyncRead;
const BUF_SIZE : usize = 1<<13;
struct TokioStream {
file: tokio::fs::File,
buf: Vec<u8>,
}
impl Stream for TokioStream {
type Item = Vec<u8>;
type Error = Error;
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
match self.file.poll_read(&mut self.buf) {
Ok(Async::Ready(0)) => {
Ok(Async::Ready(None))
},
Ok(Async::Ready(_size)) => {
Ok(Async::Ready(Some(Vec::from(&self.buf[..]))))
},
Ok(Async::NotReady) => Ok(Async::NotReady),
Err(_) => Err(Error::new(ErrorKind::Other, "😭")),
}
}
}
fn main() {
let addr = "[::1]:1337".parse().unwrap();
let service = || {
service_fn_ok( |_| {
let task = tokio::fs::File::open("big")
.and_then( |file| {
Ok(TokioStream{ file: file,
buf: vec!(0; BUF_SIZE)
})
});
match task.wait() {
Err(_) => Response::new(Body::from("Please (╥_╥)")),
Ok(s) => Response::new(Body::wrap_stream(s)),
}
})
};
let server = Server::bind(&addr)
.serve(service)
.map_err(|e| eprintln!("Error: {}", e));
hyper::rt::run(server);
}
I note that the size of the buffer greatly influences performance. So using a buffer's size of 1<<12 will make the "hyper code" as slow as the "warp one".
Ok(Async::Ready(_size)) => {
Ok(Async::Ready(Some(Vec::from(&self.buf[..]))))
}
Curious, without checking the number of bytes read, won't this sometimes send chunks of zeros (or whatever was left from a previous read)?
What OS are you testing this on? warp will try to use the blocksize from the file system to pick the best buffer size: https://github.com/seanmonstar/warp/blob/master/src/filters/fs.rs#L436
Ok(Async::Ready(_size)) => { Ok(Async::Ready(Some(Vec::from(&self.buf[..])))) }Curious, without checking the number of bytes read, won't this sometimes send chunks of zeros (or whatever was left from a previous read)?
Yep it does ! The last chunk with zeros.
It's &self.buf[.._size] ^^" Sorry, I tried a lot of things.
What OS are you testing this on? warp will try to use the blocksize from the file system to pick the best buffer size: https://github.com/seanmonstar/warp/blob/master/src/filters/fs.rs#L436
Arch Linux - Updated.
$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
I will check the buffer size (and blocksize) as soon as possible. ^^
So I checked and my device block size is actually 1<<12 (4096B).
However, it seems more interesting to take larger block sizes. Why ?
I don't know.
The laptop is really old but has no particularity.
So, something I've been thinking of is adding some sort of builder for fs filters, to allow customizing away from the defaults. This could include setting a desired read buffer size to override checking the blocksize. It'd be something like this:
let file = warp::fs::config()
.read_buffer_size(1 << 13)
// maybe you know the content-type better
.content_type("text/x-my-format")
// force a download (`content-disposition: attachment`)
.attachment()
// creates the `impl Filter` with the options set
.file(path);
Sounds great ! :D
I will be happy to implement that. I'm studying for my exams ATM but will work on it ASAP. I may ask some questions.
To anyone: Feel free to start something before I do.
I'd just like to add I also see very slow fs performance on Debian - it takes about 1.5 secs to transfer a 1.5Mb file on the same machine. I was previously using hyper with hyper-staticfile without any issues.
In addition, is there a suggested way to add etags to files served using fs::dir?
@caolan I've just pushed a commit to master that will check the blocksize, but only use it if it's over 8192. Does that help your case?
Etag configuration isn't yet supported, but it'd be good to add!
@seanmonstar no noticeable difference with master (e725bca) unfortunately
I also have limited performance on Raspbian. I cannot exceed 1.7 MB/s with master
I just noticed something that was preventing from using the proper read buffer size, will be fixed in https://github.com/seanmonstar/warp/pull/409.
@seanmonstar I've tested again with master (49ed427) and can confirm the performance has greatly improved. A 1.5mb file that was take ~2 seconds to load on my laptop is now taking ~100-120ms.
I have a setup with files that are precompressed:
app-3b9b685e302fd7e597ba.js
app-3b9b685e302fd7e597ba.js.gz
I'd like to be able to say "if the request supports gzip, and there's a compressed file, then serve it, otherwise the uncompressed version". I also want to add headers based on the creation/modification date of the file.
I don't expect that this is a super common practice, so I don't expect warp to support it out of the box. What would be nice is to have access to some of the pieces to combine them in my own way.
For example, I don't want to re-write the path traversal protection that Warp (presumably) has to get the path from the request, nor do I want to re-write the part of the code that takes a File and returns it.
Is there a possibility of gaining access to these components?
@shepmaster
Yea, adding some more building blocks would likely help people build their specific requirements.
I don't want to re-write the path traversal protection that Warp (presumably) has to get the path from the request
Agreed, so what building block makes sense here? A way to join a file Path with warp::path::tail(), maybe?
nor do I want to re-write the part of the code that takes a File and returns it.
I think this would be #171, or close to it.
what building block makes sense here
The way that Rocket does it, and I appreciate, is to make the conversion from URL path to PathBuf perform the check. In Warp syntax, maybe something like:
warp::get("assets").and_then(file_path()).map(|path: PathBuf| {
// `path` does not have traversal problems
})
If someone wanted the path without the check applied, I'd suggest a newtype (e.g. UncheckedPath(PathBuf)) that they can opt into. This way, the default / easy path is the safer path.
As an aside, I think (but have not tested) that the current traversal protection is overly aggressive. For the assets/ example, I think that assets/alpha/../beta should be fine, but from my reading of the code it would be rejected.
Some other options I would like to have control over in fs (file and dir)
- ability to set/use etags
- ability to not set last modified
- ability to set Cache-Control header (or any other headers?)
Tight now the cache handling for the default fs:dir() is horrible as deploying new updates to the files passing through are rarely detected and require a force-refresh in the browser to pickup. So being able to adjust how html files are handled separately from say assets (js, css, images) which I have cache-busting names for would make this significantly more usable.
@seanmonstar is the contract you suggested in https://github.com/seanmonstar/warp/issues/209#issuecomment-491006819 acceptable? As I can start a PR that adds that contract.