sqlite-s3vfs icon indicating copy to clipboard operation
sqlite-s3vfs copied to clipboard

dynamo backend

Open LorenzoBoccaccia opened this issue 1 year ago • 5 comments

I've built a version using dynamodb as backend using this as foundation, with (experimental) locking support, would you be interested in merging the work back? I can create a pull and iterate over the request as needed

https://github.com/LorenzoBoccaccia/sqlite-s3vfs

or the locking there can be used to make the s3 backend optionally write consistent by using a ddb table in combination

LorenzoBoccaccia avatar Apr 25 '24 20:04 LorenzoBoccaccia

Hi @LorenzoBoccaccia,

I've built a version using dynamodb as backend using this as foundation

I think using dynamodb as a backend for the data itself is pretty cool. But having a think, I think I'm going to say this is beyond the scope for this project - the chance of us using this in the short to medium term is pretty slim. Not that us using it is the only criteria for having changes merged in, but I think it's just too "far away" in some sense from something we will use, and so maintain. My suggestion for this is to keep it as a separate project that, for example, you maintain.

or the locking there can be used to make the s3 backend optionally write consistent by using a ddb table in combination [...] I can create a pull and iterate over the request as needed

But the locking, I think I am quite interested in. Are you able to raise a PR with just that?

But...

I do suspect quite a lot of discussion and so (as you suggest) iteration on this before it gets merged. Essentially will have to make sure it covers various cases - the worst of this would be clients going away while still having things locked. I'll probably have to read up a bit on locks, and especially distributed locking. And also remind myself how SQLite locking works as well. I have written https://github.com/michalc/sqlite-memory-vfs/blob/main/sqlite_memory_vfs.py#L137 that handles the (much?) simpler case of locking a file in memory. Just for background, I settled on a Python mutex to wrap all access to the "global" (in the sense of the VFS) locks for a particular file. And I do now realise it probably doesn't handle the case of a client going away while it holds a SQLite EXCLUSIVE lock on the file...

And then somehow the PR would have to have tests to cover the non-happy path cases especially

Thanks,

Michal

michalc avatar Apr 27 '24 08:04 michalc

Are you able to raise a PR with just that?

sure will cut that part in

the worst of this would be clients going away while still having things locked.

yeah currently is happy path only but I've been testing with a bunch of writer ingesting wikipedia on fts5 and as long as it's the happy path it works. I'm fine putting some work on it to handle recovery.

one thing I've found is that sqlite absolutely don't respect page size which is fine on s3 but gets expensive on dynamo as you get to do unaligned writes and reads

LorenzoBoccaccia avatar Apr 27 '24 17:04 LorenzoBoccaccia

I am ready to help test this PR as soon as it is available

refacktor avatar Apr 27 '24 22:04 refacktor

one thing I've found is that sqlite absolutely don't respect page size

Oh! I don't think I've ever witnessed this: can you give more detail?

(Maybe I've seen it just on the first page with the first 100 bytes? Not sure... maybe I'm just thinking of the initial read...)

michalc avatar Apr 28 '24 08:04 michalc

one thing I've found is that sqlite absolutely don't respect page size

Oh! I don't think I've ever witnessed this: can you give more detail?

(Maybe I've seen it just on the first page with the first 100 bytes? Not sure... maybe I'm just thinking of the initial read...)

put some detail in the pr #27 basically the pages are rounded to pèower of 2s and they are meant for alignment and as invariants more than block writes that why arbitrary xSectorSize wasn't working the relevant doc is in the pr

LorenzoBoccaccia avatar Apr 29 '24 18:04 LorenzoBoccaccia