neon icon indicating copy to clipboard operation
neon copied to clipboard

pageserver: direct I/O

Open jcsp opened this issue 2 months ago • 1 comments

Project Slack: #proj-pageserver-direct-io


Currently, we do buffered reads of data pages. Direct reads would be a better fit, because:

  • Pageserver data pages have an extremely low temporal locality on reads, because any repeatedly accessed pages are cached inside postgres. This makes it largely a waste of memory, which we could be using for other things.
  • The kernel page cache gives deceptively fast read performance on lightly loaded pageservers, making performance less consistent as pageservers are packed with larger numbers of tenants.
### Tasks
- [ ] predicting/testing perf impact of direct IO (inject delays into reads that currently hit RAM?) -- test the assumption that disks are fast enough and our bottleneck is mostly software.
- [ ] changing code base to use 4k-aligned-buffers everywhere, and validate this by having VirtualFile panic!() if it sees an unaligned buffer
- [ ] feature-flagged switch to issue IOs with O_DIRECT
- [ ] Improve userspace PageCache monitoring, where we sort index pages & expect to have a high hit rate
- [ ] Reap the benefits of no longer using so much RAM for kernel page cache: make PageCache (and any other userspace caches we choose to add) larger.

Backpointer to the Slack DMs between John and Christian about this: https://neondb.slack.com/archives/D05KTCVS40H/p1718977335312439

jcsp avatar Jun 21 '24 15:06 jcsp