cheribsd
cheribsd copied to clipboard
Add more VM fast-out mechanism
The revoker needs to touch every real page in an address space. However, there isn't an extant mechanism to ask, in the general case, for the page at an address other than try vm_page_grab_valid
, and, failing that, to have vm_fault
do its thing. (In some cases we can use vm_page_find_least
to jump over holes in objects and avoid filling them with zero mappings, but that doesn't work for, e.g., CoW mappings.)
vm_page_grab_valid
has a VM_ALLOC_NOCREAT
(there's that missing e again) which is almost the right behaviour but we'd like a slightly weaker flag that doesn't create new pages but does fully reconstitute pages currently only partially valid.
vm_fault
, because it's designed to do what it says on the tin, will, left to its own devices, allocate and map zero pages. We'd like to signal to vm_fault
that it doesn't need to do that, and we need its response to unambiguously signal whether it would have allocated a zero page but for our request vs. a true error that needs response.
https://reviews.freebsd.org/D30863 and https://reviews.freebsd.org/D30864
I don't quite understand VM_ALLOC_NOZERO, or at least, its use in the caprevoke branch seems wrong to me. The difference with NOCREAT is that a resident, invalid page will be paged in if available. (Partially valid pages only arise in the filesystem buffer cache, which I believe are never cap-bearing...?) But the presence of an invalid page in the top-level object doesn't mean anything special; such pages can arise due to transient conditions (see commit https://cgit.freebsd.org/src/commit/?id=4bf95d00cebf4d188d71db759fa492eb6f00b197 in FreeBSD). In particular, the top-level object may still have a backing object, and during a revocation scan I believe we still need to traverse the object chain by calling vm_fault().
Oh, yea, that does seem like it's probably wrong, upon re-reading. Backing up, the intent was to use vm_page_grab_valid
to traverse the existing pages without filling in gaps (with CoW mappings of zero pages), and fall back to vm_fault
for anything that we couldn't readily figure out quickly. That's not quite what the code as written does, tho', is it, since we still need to check vm_pager_has_page
even if vm_page_lookup
has failed. We should only be taking the early out if NOZERO
is set and the test a few lines down (vm_pager_has_page(object, pindex, NULL, &after)
) is false, but we'd like to do that test before allocating the page.
Does that seem more sensible?
I tend to think that any case where there is no (valid) page resident in the top-level object is one where we have to defer to vm_fault()
. Even if the pager has a copy of the page, note that we end up dropping the object lock to perform the page-in, which in principle requires re-validation of any object state that we care about (probably not a problem here on its own?), and in the current code we will hold the vm_map lock across the page-in as well, blocking all modifications to the map. From a correctness perspective this might be ok for the subset of objects which may contain capabilities (it's not ok at least for OBJT_VNODE objects), but it still would be an awfully long time to hold the map lock. I could easily be missing something here, but it's not clear to me that there's a significant performance gain to be had from trying to avoid vm_fault() in situations where we have to do a page-in anyway. (And if we do use vm_page_grab_valid() to perform page-ins, I suspect it would be useful to use VM_ALLOC_COUNT() in the parameters to request some amount of read-ahead from the pager, otherwise we are paging in 4KB at a time.)
I'm sorry, I forgot that the fallback path does a vm_fault(VM_FAULT_NOFILL)
, not a regular VM fault. Now your proposal makes more sense to me: we look in the top-level object's radix tree, and if that fails to turn anything up, we ask the pager if it has the page, and if not we can possibly jump to the next page for the QUICK_SUCCESSOR optimization[*]. So the problem is just that vm_page_grab_valid(VM_ALLOC_NOZERO) can return an invalid page if the pager failed to turn anything up, I believe. (Well, I still think we don't want to perform a page-in when _NOZERO is specified, but in practice that'll be a corner case.)
[*] Doesn't this let you extend the QUICK_SUCCESSOR optimization to OBJT_SWAP objects, so long as they don't have backing objects? That would help with POSIX shared memory objects, which are instantiated as OBJT_SWAP, not OBJT_DEFAULT.
I believe this has been overcome by events.