orama icon indicating copy to clipboard operation
orama copied to clipboard

[feat] persist/restore database

Open gioboa opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. In order to be in line with cloud environment (or instance rebooting) an API for persist/restore in memory database could be fantastic.

Describe the solution you'd like I'd like file system usage for this kind of activities because you can resize disks size without reboot the instance and disk usage is cheaper than ram.

gioboa avatar Aug 01 '22 07:08 gioboa

Hi @gioboa! We've already implemented this feature 🙂 we can serialize the entire index and write it to disk for the reason you mentioned. We're currently trying to implement a common API for Node.js, Bun, and Deno, as disk access may vary.

We'll be releasing this feature with the first stable release or so

micheleriva avatar Aug 01 '22 07:08 micheleriva

Nice 👏 I'm looking forward to try this feature 👍

gioboa avatar Aug 01 '22 08:08 gioboa

Super interested in the solution taken.

I was serializing the tree to a JSON representation, with a separate file for the doc map. Seeding from a file of just schema documents was very slow (parse, tokenize, insert circa 14,000 was 450 ms) for cold starts on edge nodes (AWS Lambda).

edit

@micheleriva for the API on this it would be interesting to know if disk support only will be available - or if object storage will be supported e.g.

  • AWS s3
  • Cloudflare R2
  • Google Cloud Storage Buckets

I would assume this can be achieved if the serialization returns any kind of buffer that can be HTTP streamed to these object stores.

simonireilly avatar Aug 01 '22 13:08 simonireilly

@micheleriva it would be nice if this worked in a browser too, so you could back it up in indexeddb.

matthewp avatar Aug 03 '22 14:08 matthewp

@simonireilly we made a full rewrite of our APIs and data structures right before the first initial release. You can now serialize the whole index using protocol buffers, dpack, json, whatever you prefer, we made it fully compatible with most modern serialization formats 🙂 That means that you could also run this in the browser (cc @matthewp )

micheleriva avatar Aug 07 '22 11:08 micheleriva

@matthewp, @gioboa, a little update: this is a work in progress in the https://github.com/LyraSearch/plugin-disk-persistence repo. Any help would be highly appreciated 🙏

micheleriva avatar Aug 10 '22 12:08 micheleriva

I'm gonna close this issue and pin it as it might be interesting for other people wondering how to persist Lyra data

micheleriva avatar Aug 10 '22 12:08 micheleriva

This sounds great, thanks a lot!

matthewp avatar Aug 10 '22 13:08 matthewp

From an API perspective, if I was creating a persistence package, I would want the ability to persist on each change to the database. It might be nice if a lyra DB was an event emitter for that reason. I could imagine a disk persistence package working like:

const persist = (db, format) => {
  saveToDisk(db, format);

  db.addEventListener('change', () => {
    saveToDisk(db, format);
  });
};

That way a user would only need to call persist once and not manage calling it every time they inserted, removed, etc.

matthewp avatar Aug 10 '22 13:08 matthewp

@matthewp let's discuss this in the disk persistence repo 🙂

micheleriva avatar Aug 10 '22 13:08 micheleriva

@micheleriva Happy to. I'm interested in creating an indexeddb package so this would (hopefully) be part of the lyra db API and not just specific to disk persistence.

matthewp avatar Aug 10 '22 13:08 matthewp