neon
neon copied to clipboard
pageserver: use direct IO for delta and image layer reads
Part of #8130
Problem
Pageserver previously goes through the kernel page cache for all the IOs. The kernel page cache makes light-loaded pageserver have deceptive fast performance. Using direct IO would offer predictable latencies of our virtual file IO operations.
In particular for reads, the data pages also have an extremely low temporal locality because the most frequently accessed pages are cached on the compute side.
Summary of changes
This PR enables pageserver to use direct IO for delta layer and image layer reads. We can ship them separately because these layers are write-once, read-many, so we will not be mixing buffered IO with direct IO.
- implement
IoBufferMut
, an buffer type with aligned allocation (currently set to 512). - use
IoBufferMut
at all places we are doing reads on image + delta layers. - leverage Rust type system and use
IoBufAlignedMut
marker trait to guarantee that the input buffers for the IO operations are aligned. - page cache allocation is also made aligned.
* in-memory layer reads and the write path will be shipped separately.
Testing
Integration test suite run with O_DIRECT enabled: https://github.com/neondatabase/neon/pull/9350
Performance
We evaluated performance based on the get-page-at-latest-lsn
benchmark. The results demonstrate a decrease in the number of IOps, no sigificant change in the latency mean, and an slight improvement on the p99.9 and p99.99 latencies.
Rollout
We will add virtual_file_io_mode=direct
region by region to enable direct IO on image + delta layers.
Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist