cosmopolitan icon indicating copy to clipboard operation
cosmopolitan copied to clipboard

Proof of concept of sqlite serialization

Open oltdaniel opened this issue 2 years ago • 7 comments

Inspired by the StoreAsset function in order to store any file within the APE itself, I wondered if this is also possible with the SQLite file. Currently it is still required to store the file on the host itself and then adding it to the APE file afterwards.

In order to avoid too much work, I ignored the possibility of implementing a custom SQLite VFS that would be a communication layer between the APE file and the SQLite library. This should be in theory possible as well, but requires a lot of tuning to avoid moving the ZIP file content around too much. Thereby, the simple workaround with sqlite3_serialize and sqlite3_deserialize was enough. This required compiling the SQLite library with the SQLITE_ENABLE_DESERIALIZE and exposing the mentioned functions via lsqlite3.c to the Lua machine.

Testing

The demo has been extended with two files. Here a explanation what they do:

  • sql-backupstore.lua stores the SQLite database from .init.lua within the APE file as an backup and returns the stored size.
  • sql-backup.sql queries the stored database by loading it and executing the same query used in sql.lua

ToDo

  • [ ] fix style issues (if requested, clang-format config has been followed where possible)
  • [ ] learn more about the zSchema parameter to make correct use of sqlite3_serialize/sqlite3_deserialize
  • [ ] any documentation required for these new exposed functions?

oltdaniel avatar Jun 20 '22 20:06 oltdaniel

Love that! @jart, this should be sufficiently simple to include. I've also been looking for a way to keep in-memory and on-disk images in sync (without worrying too much about a possible loss; mostly for performance reasons) and this should enable that support as well.

pkulchenko avatar Jun 20 '22 21:06 pkulchenko

Nice contribution. I love the idea. Implementation looks clean. Thank you!

Theoretical feature request:

If the app crashes, you lose all changes since the redbean started. That's fine, the archived copy is safe. What are the things to consider if one wanted to perform a save on the db while the redbean is running? App specific behavior for sure.

I am just imagining:

  1. serialize
  2. deserialize
  3. user provided callback or something like that to initialize app logic which relies on sqlite

is there anything in redbean itself which needs to know? could the programmer put themselves in a situation where they do an incremental save and redbean cannot handle working with the newly deserialized :memory: db? I'm thinking of stale refs to old db laying around maybe?

I don't know if this is feasible, just trying to think it through out loud.

thetanil avatar Jun 22 '22 09:06 thetanil

Thanks for the Feedback.

If the app crashes, you lose all changes since the redbean started. That's fine, the archived copy is safe. What are the things to consider if one wanted to perform a save on the db while the redbean is running? App specific behavior for sure.

That is a good point. I haven't done this kind of testing yet as I wanted to check first, if I'm not the only one that would find this feature nice. Yes this would be App specific or Framework specific depending on the project. We would just expose the bare functions (as this is the point of Redbean as @jart mentioned in the presentation about it). I will test saving the database "during" writes by just calling both endpoints with wrk and see if anything has been corrupted.

I am just imagining:

  1. serialize
  2. deserialize
  3. user provided callback or something like that to initialize app logic which relies on sqlite

is there anything in redbean itself which needs to know? could the programmer put themselves in a situation where they do an incremental save and redbean cannot handle working with the newly deserialized :memory: db? I'm thinking of stale refs to old db laying around maybe?

I would think that this feature is more or less used for write only during runtime and only is reading on the first initialization of redbean. But this would be an app specific functionality again. Redbean would just expose the possibility to actually serialize/deserialize a database. How you handle these functions within your application should be restricted to an example only as done with the SQLite database usage across different routes.

--

I think I'll test the usage of the database outside of the .init.lua and see how it behaves and if we run into any issues regarding data corruption with these SQLite functions themselves. Any more logic would probably live in an lua module. We could write that to have a cleaner .init.lua as it is quite bloated right now. But as this is the first work ever I've done with SQLite and Lua so I would need some more time reading the documentation.

oltdaniel avatar Jun 22 '22 10:06 oltdaniel

I definitely think this would be useful for not only you and I but many others. I didn't realize that you're calling the sqlite api directly from init.lua. LoadSqlite and StoreSqlite I assumed were the public api for something hidden away more. Obviously now, it's all under the app devs control so they can do whatever they like. If they do get packed away into an API I think the callback for application (re)init still makes sense then. Anyway, keep it up, this is really nice!

thetanil avatar Jun 23 '22 14:06 thetanil

Update on this feature. Sadly SQLite does not implement the use of a in-memory database across different processes. Thereby, this feature is only partially useful, as it would only allow changing the stored database within the APE file upon request, but not actually restoring a database to use it as an in memory one across the processes.

Where to go from here? As mentioned in the initial comment of this PR, there is the option to implement and VFS for SQLite that would work as an layer between the APE File and SQLite itself. If this can be implemented, the full feature-set of SQLite is supported.

The following resources can be read to determine this in-memory database behavior:

  • "allows two or more database connections in the same process to have access to the same in-memory database." (cited https://www.sqlite.org/sharedcache.html section 6)
  • https://redbean.dev/#lsqlite3 The fork architecture of readbean means, that only the database content created in the .init.lua are valid across different calls. No additional writes would make it back to this initialized database.

oltdaniel avatar Jun 23 '22 17:06 oltdaniel

Sadly SQLite does not implement the use of a in-memory database across different processes.

@oltdaniel, I think there are still some interesting scenarios that can be supported even with the current implementation.

  1. If you have the main process that does writing to the in-memory DB, then all forked processes can read that DB using already opened connections. This seems to work in my testing and I don't see why it wouldn't be working in general. For obvious reasons, as you noted in your last sentence in the previous comment, the writes from any forked processes won't be seen by other processes.
  2. If you can rearrange the app logic in the main redbean process, then it should be possible to handle all write queries in the main process and fork separate processes for all read queries, thus still maintaining in-memory data.
  3. I have some cases where it may be useful to have in-memory DB and save/restore it (for example, using this data for rate-limiting), but it's not super critical if the data is lost.

pkulchenko avatar Jun 23 '22 22:06 pkulchenko

I think there are still some interesting scenarios that can be supported even with the current implementation.

Most definitely. Exposing these functions allows for a wider use of working with the sqlite files themselves. I changed the demo of redbean to simply include an backup example of the in-memory database and query from that backup.

The original idea is I think only feasible with SQLite VFS or simply using another in-memory database that allows cross process usage.

oltdaniel avatar Jun 24 '22 09:06 oltdaniel

@oltdaniel, one question on the patch: why is szBuf value set to 5 here: sqlite3_deserialize(db->db, "main", buffer, size, 5, 0);? According to the docs it should be the total size of the DB and should probably be larger than (or at least equal to) size (and ideally configurable).

I'm thinking about allowing szBuf to be configurable and if it's not set, then set it to size and add SQLITE_DESERIALIZE_RESIZEABLE to flags.

pkulchenko avatar Sep 24 '22 06:09 pkulchenko

@oltdaniel, also, can you send @jart the copyright assignment per the instructions here if you haven't done that already?

pkulchenko avatar Sep 24 '22 06:09 pkulchenko

@oltdaniel, one question on the patch: why is szBuf value set to 5 here: sqlite3_deserialize(db->db, "main", buffer, size, 5, 0);? According to the docs it should be the total size of the DB and should probably be larger than (or at least equal to) size (and ideally configurable).

Thanks for the hint @pkulchenko . Fixed it now, wasn't familiar with the SQLite API before.

Sorry for letting this get placed on the backburner. Also welcome as a new contributor. We need to go through one quick hurdle before we can proceed. Could you please email Justine Tunney [email protected] and say that you intend to assign her the copyright to the changes you contribute to Cosmopolitan? Please send the email under your legal name rather than anonymously. Please use a real email account associated with your identity. See CONTRIBUTING.md for further details. We make the ask because it helps us ship the tiniest most legally permissive prim and proper binaries.

No problem @jart . As I've used my noreply E-Mail from my GitHub Account for the commits, that isn't possible as such. Is there an alternative? Granting the copyright here should be enough.

I hereby assign the copyright of the code written by myself within this Pull Request to @jart .

oltdaniel avatar Oct 02 '22 10:10 oltdaniel

@oltdaniel The commit email address isn't an issue. You aren't required to disclose your email address to the public. You just have to disclose it to me. Your GitHub profile says you're Daniel Oltmanns. Send me an email from yourself, link to this PR in the email. Then after you've sent the email, leave a comment on this PR saying "I sent it!". That's all I need.

jart avatar Oct 02 '22 17:10 jart

@jart I sent it!

oltdaniel avatar Oct 03 '22 14:10 oltdaniel

Thanks for sending it! I'll go ahead and merge now. Thank you again for contributing this to the project!

jart avatar Oct 05 '22 14:10 jart

@oltdaniel, FYI: I made a couple of changes in #701 to allow the DB to grow beyond the initial buffer size.

pkulchenko avatar Nov 28 '22 06:11 pkulchenko