node-event-storage icon indicating copy to clipboard operation
node-event-storage copied to clipboard

Return storage document header with reads to allow external sorting by sequenceNumber or time64

Open albe opened this issue 4 years ago • 1 comments

Since 0.7 the storage layer stores an external sequence number and a monotonic time64 timestamp into every document. Until now that information is not returned back when reading from the storage.

This requires an API change, which is breaking.

albe avatar Feb 22 '21 08:02 albe

The EventStore read API should not be dealing with internal document sequence numbers and timestamps, so that part should not change. The goal though is, that the storage sequence number can be used to replace the storage level global index for cross-stream (partition) reading purposes. At least the global index should only be optional for performance improvements and not mandatory to reconstruct the document order. See #24, which requires iterating all partitions in insertion order to reindex documents.

The Storage read API currently consists of two methods:

read(number, index): document

This API method does not need to change. If you want to read a single document from the storage, the sequence number is already known and timestamp is likely not of interest. For the case they are, a new method can be added.

*readRange(from, until = -1, index = null): Generator<document>

This API method is supposed to return all documents in the order they were written to the storage. If an index is specified only the documents in that index (stream) should be returned. Hence, technically this API also shouldn't change - a reader is likely not concerned with the individual document's sequence number (he only wants them in the given range and in order) or timestamp. Again, an additional API method can be added to allow this use-case.

So effectively, the *iterateRange(from, until, index) implementation should not read from the global index, but instead iterate over all partitions and return the documents in the sequenceNumber order.

A potential additional API method could be something like

*readTimeRange(fromTime, untilTime): Generator<document>

which would return all documents within a given time range, rather than sequence number range. Once a method that iterates all documents and orders by the document metadata is implemented, adding this API should be straightforward. The biggest issue to solve is how to efficiently find the start/end point for the range. That could be solved by indexing the document time

albe avatar May 30 '21 12:05 albe