risingwave
risingwave copied to clipboard
Tracking: deprecate safe epoch and generalize time travel query
Proposal
Generalize time travel query for all batch queries, which means that all batch query will be handled as time travel query.
In a single HummockVersion
, we only provide a single view at the committed epoch rather than views at all epochs between safe_epoch
and committed_epoch
, and as a result, we can then deprecate safe_epoch
.
Moreover, we need to deprecate support on barrier read on uncommitted epoch with consistency.
Motivation
Currently, we have safe_epoch
in HummockVersion
to specify that, in this HummockVersion
, we are safe to make a query on any epoch above this safe_epoch
. In other word, we support querying multiple versions of data under different epochs providing a single HummockVersion
. The reason for this feature is that, in each CN, we only have a single latest HummockVersion
(ignored those versions pinned at created iterators), but in frontend, each session will pin an epoch (PinnedSnapshot), and we want to serve the query from different pinned epoch with this single latest HummockVersion
.
This design makes the communication between frontend and CN elegant, but comes with price on the other hands:
- In hummock, for a key, we may store multi-version of its values in different epochs, and these values are stored physically next to each other in the sst. In most streaming internal states, we only read the latest value of a key, and therefore storing multiple versions of value will incur unnecessary cost.
- We need to maintain and even persist the pinned snapshot, so that frontend won't be affected when meta node crashes and restarts
After we support time-travel in batch query, to support queries on different epochs, we don't have to rely on a single hummock version, and instead, we can rebuild a hummock version for a specific epoch. Therefore, we can generalize time travel query for all batch queries, which means for all batch queries, we will first figure out a hummock version for the provided epoch, either from the latest version, or rebuild a new version, and then read data the version, and then each hummock version does not need to store multiple versions of a key anymore, and the safe_epoch
can be deprecated.
Besides, we need to deprecate support on barrier read on uncommitted epoch with consistency. Currently, for uncommitted barrier read, we pin an uncommitted non-checkpoint current epoch and use this epoch in batch query. However, since this pinned epoch is non-checkpoint epoch, after this checkpoint epoch gets committed, the pinned non-checkpoint epoch will be below the committed epoch, and to support consistent query on this epoch, the committed version will still have to maintain values of multiple versions between the committed epoch and the previous checkpoint epoch. To make things easier, we can still support barrier read, but the batch query of barrier read won't carry any epoch information anymore. The barrier read batch query always reads the latest uncommitted data of each table, and the consistency is ignored.
Tracking
- [ ] deprecate support on consistent barrier read
- https://github.com/risingwavelabs/risingwave/pull/18230
- [ ] generalize time travel for all batch queries, which means we always use
HummockReadEpoch::TimeTravel
in batch query, and refine the read logic accordingly - [ ] deprecate
safe_epoch
and do not persistPinnedSnapshot
- [ ] ignore safe epoch in compaction, and only store the latest value of a key in compaction.