electric icon indicating copy to clipboard operation
electric copied to clipboard

feat: `PersistedShapeStream` with generic storage interface

Open msfstef opened this issue 1 year ago • 2 comments

Addresses https://github.com/electric-sql/electric/issues/1519

Followed by https://github.com/electric-sql/electric/issues/1519 for more refactoring and improvements, split into two PRs for reviewability

Introduces a PersistedShapeStream - follows the newly defined ShapeStreamInterface so it can be used interchangeably, but requires:

  1. A storage option to be provided, implementing the ShapeStreamStorage interface, which is a key-value store oriented interface
  2. To not provide an offset and shapeId as the persisted shape stream cannot resume from an arbitrary offset and shapeId, it uses the persisted ones (although I suppose we could make it wipe the underlying store if an offset and shpaeId are provided to simulate the exact same behaviour)

What it does is it subscribes to the underlying ShapeStream and compacts entries to the provided store, very similar to what Shape does. When someone subscribes to the PersistedShapeStream, it looks into the store and returns the compacted entries as inserts and resumes from there.

One tricky bit with a bit of a hacky solution is the persistence of the offset and shapeId, needed to actually resume the stream. We could store them in a separate metadata store entry, but that might require either 1) custom key prefixes on the store to separate metadata and 2) deal with inconsistent state.

For example, if metadata is stored separately from the operations and the store has no transactional guarantees (which the interface assumes), an operation might be stored with offset 10_3, and the metadata fails to get stored because the app shuts down - so upon resuming the 10_3 operation is fed to the subscriber but then the stream is resumed from the previous latest offset that we managed to persist (e.g. 10_2), duplicating the change.

The workaround is that the latest stored offset can be retrieved by reading into the stored entries and comparing their offsets (assuming the offset is not opaque here, but we can do that by adding a timestamp otherwise) and using the most recent one - has an overhead of needing to read the whole store to resume but is foolproof and can also do consistency checks to decide if it actually can resume.

The shapeId I've stored alongside the entries, but really only in one entry stored, such that when we iterate over the data to get the latest offset we will eventually find the shapeId as well, but without needing to store it fully in every entry. Depending on the actual storage implementation this approach might be effective or slightly annoying. My main concern was to 1) avoid dealing with consistency with non-transactional stores and 2) not introduce separate storage APIs that people need to conform to.

If someone wants to use a transactional storage engine, they might as well go to PGlite with the sync plugin - I believe the use case for this is simpler key-value stores like localStorage etc.

msfstef avatar Sep 04 '24 16:09 msfstef

an operation might be stored with offset 10_3, and the metadata fails to get stored because the app shuts down - so upon resuming the 10_3 operation is fed to the subscriber but then the stream is resumed from the previous latest offset that we managed to persist (e.g. 10_2), duplicating the change.

We could also filter out the extra operations to avoid the duplicates.

That seems better so that on loading we can immediately send the sync call and not have to wait to load all the old data first.

KyleAMathews avatar Sep 08 '24 17:09 KyleAMathews

How about we put this on the back burner for some time? Being able to persist the shape log will be a nice feature at some point but it's not particularly urgent until we get a lot more users and some that definitely need this. Most users won't need this and I don't think it's worth a lot of time to benchmark and fine-tune this — especially if it requires a big refactor.

KyleAMathews avatar Sep 09 '24 02:09 KyleAMathews

This is pretty out-of-date w/ main & not a priority so I'll close it for now.

KyleAMathews avatar Oct 08 '24 16:10 KyleAMathews