libsql icon indicating copy to clipboard operation
libsql copied to clipboard

btree: virtualize page allocation methods

Open psarna opened this issue 2 years ago • 5 comments

This commit abstracts away page allocation methods, originally based on a simple freelist persisted at offset 32 with its size stored at offset 36 in the page header. The new code allows redefining the allocation methods. The target use case is mitigating contention when using an optimistic lock-free storage layer.

The overridden allocation methods do not assume access to internal SQLite structures, and should instead implement their own persistence mechanisms.

Fixes #11

TODO:

  • [ ] tests based on a custom VFS implementation, making sure that all code assumptions about the contents of offsets 32, 36 still hold
  • [ ] tests checking that the interface is foolproof enough, e.g. that it requires both page allocation and deallocation to be overridden, and not just one

psarna avatar Oct 05 '22 13:10 psarna

self-note: for multiple reasons, ranging from easier backward compatibility to reducing the number of parameters passed around, I think it ultimately makes sense to add these custom allocation functions to the vfs interface. I'll do that tomorrow, and next thing in line is producing tests to validate that overriding the interface with another implementation, e.g. a stub based on keeping an in-memory freelist, keeps working, and no assertions are triggered around the fact that offsets 32 and 36 aren't used anymore.

psarna avatar Oct 05 '22 18:10 psarna

@losfair fyi. I'm not done with setting up the tests, so can't verify if it works yet, but feel free to experiment with mvsqlite integration

psarna avatar Oct 06 '22 10:10 psarna

(just found a segfault with manual tests, will fix soon)

psarna avatar Oct 06 '22 15:10 psarna

done

psarna avatar Oct 06 '22 16:10 psarna

Testing the change automatically will get more complex (as expected), because page 1 of the database contains specific metadata that needs to be updated too, e.g. the total number of pages at offset 28. Ref: https://www.sqlite.org/fileformat.html#the_database_header . It could be either expected to be hacked by the xAllocatePage provider or taken care of automatically, I prefer the latter. Perhaps the page allocation routine should simply inspect the database header and bump the number of pages if xAllocatePage returned a number larger than current max.

This is going to be an extremely educational journey!

psarna avatar Oct 06 '22 17:10 psarna

@losfair would love to get some testing / comments on this from you before we put it out of draft!

glommer avatar Oct 25 '22 17:10 glommer

btw: I need to manage to find a time slot to retest this more thoroughly, because I suspect the routines might need to also update the number of pages the btree claims to use (pBt->nPage) once there's an allocation. Or, just make the interface a little more intrusive and simply allow users to implement the whole allocation routine which uses internal data structures, i.e.

int xAllocatePage(
  BtShared *pBt,         /* The btree */
  MemPage **ppPage,      /* Store pointer to the allocated page here */
  Pgno *pPgno,           /* Store the page number here */
  Pgno nearby,           /* Search for a page near this one */
  u8 eMode               /* BTALLOC_EXACT, BTALLOC_LT, or BTALLOC_ANY */
)

That's a little ugly, because sqlite3_vfs does not require any of these types to be known, so perhaps it's enough to just make sure that the total number of pages is updated each time a new page is allocated.

psarna avatar Oct 27 '22 13:10 psarna

edit: I think nPage is fine, but I posted a fix: v2:

  • release page from the pager unconditionally after taking it, as should always be done once you're done with using a page

psarna avatar Oct 27 '22 18:10 psarna

I am closing this because no users appeared for the page allocation virtualization. We can resurrect this later if needed.

penberg avatar Sep 07 '23 12:09 penberg