ocsigenserver
ocsigenserver copied to clipboard
Use buffer pool to avoid large buffer allocation for connections and responses
A trivial buffer pool is used to get rid of these buffer allocations:
- per connection/receiver buffers (
Ocsigen_http_com) - internal buffer in
Ocsigen_senders.File_content - internal buffer in deflatemod
Moreover, some care is taken to avoid allocation for the last chunk of a file when the remaining data doesn't match the buffer size.
The changes are minimal, but reviews are appreciated. In particular:
- I haven't tested deflatemod yet
- I made a mistake in my first attempt at addressing the last-chunk issue (forgot that the buffer size could be rounded up to the next power of two), so any extra eyeballs to look for errors would help
- I had to release the response stream resources (i.e. buffers) in
Ocsigen_server; it's literally 3 lines, but it's quite big conceptually
Overall, these changes deliver up to ~25-30% more speed in some requests.
Performance
Effect of buffer pools
(buffer pool disabled with OCSIGEN_DISABLE_BUFFER_POOL -> enabled) o=150 is the GC setting used in half of the runs
404
12000 -> 12800
o=150 12900 -> 13500
normal response (20 bytes)
~ 8100 -> 8900
o=150 ~ 8400 -> 8900
16kb file ->
6721 -> 7800
o=150 7100 -> 7900
Effect of last-chunk optimization when serving files
PRE POST
---------------------- ----------------------
15kb file -> 15kb file ->
6200 -> 7400 6400 -> 7950
o=150 7000 -> 7800 o=150 7200 -> 8200
7kb file -> 7kb file ->
7500 -> 8550 7450 -> 9150
o=150 7650 -> 8700 o=150 7850 -> 9250
1kb file -> 1kb file ->
8100 -> 8750 8100 -> 8750
o=150 8400 -> 8800 o=150 8550 -> 9000
---------------------- ----------------------
Overall effect of Lwt_chan fix #49 + buffer pools + last-chunk optimization
(only for files): compare figures at the top + those in POST to these for 2.6:
2.6
--------------
404 16kb file ->
11800 5500
o=150 12500 o=150 6100
20-byte response 15kb file ->
7450 5200
o=150 7900 o=150 5900
7kb file ->
6000
o=150 6700
1kb file ->
7650
o=150 7950
AFAICS it builds & installs fine w/ jenkins (certainly builds for me). Same (phony?) uninstall error as in #84.
Thanks for doing this. I haven't read the last-chunk part carefully, but the rest of the code looks OK to me. The overhead (hashtable, stacks, release closures) should be insignificant for buffers of size 1K or more.
I can install and uninstall (the buffer-pool branch) just fine locally, it is hard to say why rm fails inside the docker container.
On Mon, Aug 17, 2015 at 05:38:28AM -0700, Vasilis Papavasileiou wrote:
Thanks for doing this. I haven't read the last-chunk part carefully, but the rest of the code looks OK to me.
Thank you for taking a look :)
The overhead (hashtable, stacks, release closures) should be insignificant for buffers of size 1K or more.
Indeed. In this case there's a speed increase across the board for any response size (even 404s and 20-byte responses become a bit faster) -- ocsigenserver cannot know a priori that a connection will not make full use of the connection buffers, so it's got to use large ones.
For hashtables of such size (a few dozen elements at most) I've measured over
4e7 lookups/s (we only need a handful of them per request, so it's over 3
orders of magnitude away :). The closures are essentially free (with Lwt,
we're building much heavier ones all the time anyway), and the only thing I
didn't know how to quantify easily was the caml_modify inherent in
Stack.push, but it clearly involves less work than the mark&sweeping we'd
have if we were allocating 4KB buffers left and right.
I can install and uninstall (the buffer-pool branch) just fine locally, it is hard to say why rm fails inside the docker container.
I'm told it's an issue in the build script used for PRs.
Mauricio Fernández