microbin icon indicating copy to clipboard operation
microbin copied to clipboard

Persistency improvements

Open szabodanika opened this issue 2 years ago • 7 comments

Currently all pastas are stored in the memory. Can we save them on the disk in either a tiny database or simply in files? Storing everything in the memory could be bad if the service is running on a tiny VM for example with limited RAM, or if the pastas are important and the user wants to retain them even if the service is restarted.

szabodanika avatar Apr 27 '22 21:04 szabodanika

This feature is now added

szabodanika avatar May 02 '22 21:05 szabodanika

Hello! Thank you for your project!

I have two questions regarding "persistency" feature:

  1. Could you please add an option to set a path to "pasta_data" folder? This would especially be useful for Docker deployments
  2. Are there any plans for other backends? Currently all data is stored in JSON file which might not be cool for heavy usage. For example, data could be stored in SQLite database. Or even re-add memory backend for "fire and forget" deployments

MasterGroosha avatar Jun 05 '22 07:06 MasterGroosha

Welcome @MasterGroosha, thanks for your suggestions! Upgrading the simple file + json storage is something that many have asked about already and I am now thinking about the best way to go about this. I think the best would be an option like --sqlite [path to sqlite db] that would enable an sqlite backend, and otherwise it would just keep using the default json format. Setting the path to pasta_data is definitely coming asap, until then you can customise where you are running the executable from.

szabodanika avatar Jun 05 '22 18:06 szabodanika

I'm not very confident with my system engineering knowledge, but I think it's not a good idea to store files a BLOBs in SQLite. In my thoughts, there was something like:

├── microbin
├── data (folder)
│   ├── database.sqlite
│   ├── files (folder)
│   │   ├── file1.txt
│   │   ├── file2.tar.gz
...

MasterGroosha avatar Jun 05 '22 20:06 MasterGroosha

As I see it currently microbin is still being limited by ram. while the pastas are persisted to file, they are still all loaded into memory at the start and kept there.

I think either writing every pasta to an individual file or writing everything to sqlite would be a great improvement as that would allow loading individual pastas from storage, making it more viable on small vms. I do not know how much of an overhead it is to save BLOBs in sqlite, but I think if we split it like suggested by MasterGroosha we might have to implement some locking as it could be possible for the file and the rest of the pasta to be updated by different requests(data race). although I am not entirely sure what the concurrency implications are on writing to file without a lock either. Given that I think SQLite would be the superior option as the client handles concurrency and atomic updates for us, and sqlite is widely used. Additionally I think sqlite might be useful for implementing the feature requested in #10

I would love to take a crack at this. but I don't have much experience with database migrations in rust. As I see it our options are:

  • Diesel (easier Migration setup but no async)
  • SeaOrm (async, but setting up Migrations seems complicated)

uniqueNullptr2 avatar Jul 21 '22 15:07 uniqueNullptr2

I support the idea of reducing RAM usage - since MicroBin is intended as a personal / tiny org tool I did not consider the footprint too much originally but I would love to extend it a flag like --sqlite [db file name] to keep everything within an SQLite DB and keeping the memory usage low(er). Since our data model is not very complex and we don't have any particularly complex queries, I don't think we need to use an ORM, a simple DB library with a few basic SQL queries should work absolutely fine IMHO. As for storing files as BLOBs within SQLite, I'd vote against it, mostly just because of the added complexity, but I'd be happy to explore that idea after the base data (that is currently stored in JSON) is successfully stored in SQLite.

szabodanika avatar Jul 25 '22 12:07 szabodanika

Ok I started some work on this but am running into some diffuculty.

I thought it would be a good idea to replace the AppState which is currently a Mutex<Vec<Pasta>> with a dyn DataStore, where the datastore can be implemented for the current InMemoryJson store and the new sqlite store. This would make any future changes to datastore a lot easier but it would be a big change on this feature.

The exact problem I have with this, is that the sqlite would return owned data, where right now it is using an immutable reference which is desirable. Otherwise we would have to clone the whole pastalist to reaturn it in the route which would put further strain on memory usage. I think we could get around this by returning a CoW from the method.

The obvious naive implementation would be to do if/else depending on the set sqlite flag everywhere we interact with data, but that would be kinda ugly.

Thoughts on this?

edit: its a catch 22 Sqlite api will always return owned data, but that disparity can be solved with a Cow however the api for the jsonstore has some problems.If we do the mutability internally we are also forced to return owned data as a reference cannot be passed outside because of the mutexGuard. However if we use a mutex on the outside we cannot use trait objects because Mutex<T> requires T to be sized. catch 22. so I am not sure how to proceed

uniqueNullptr2 avatar Jul 25 '22 15:07 uniqueNullptr2

Hey! Is this still being worked on? I wouldn't mind taking a crack at getting Sqlite setup as a storage backend.

hay-kot avatar Oct 01 '22 18:10 hay-kot

Hey! Is this still being worked on? I wouldn't mind taking a crack at getting Sqlite setup as a storage backend.

I think a lot of users would still like to see this implemented. Also I am planning to create an "official" public MicroBin server that will definitely need a high-performance backend. There is a nice PR for this but I would like to see something finished before merging anything. Whether you take the route of a data-agnostic persistency API or something simpler is up to you. Once I am caught up with the smaller issues and PRs completely I will have a more serious look at this myself as well

szabodanika avatar Oct 22 '22 19:10 szabodanika

I currently don't have the time to work on this so feel free to give it a go @hay-kot

uniqueNullptr2 avatar Oct 26 '22 12:10 uniqueNullptr2