CCF
CCF copied to clipboard
How to detect deletes in the index building
Is your feature request related to a problem? Please describe.
When building an index with the handle_committed_transaction we can iterate through the writes in that transaction but I can't see a way to detect what was deleted. Without this information we can't keep our data up to date accurately.
Describe the solution you'd like
A method or field on the StorePtr that returns an iterator or list of the keys that were deleted in this transaction.
Describe alternatives you've considered
Changing the application to model deletes with a write instead of performing an actual delete. This has the downside of not being so explicit and adds extra logic to ignore the tombstones in the main request handling logic.
This was probably introduced recently, in #4145, as it was possible to identify a deleted key (negative version), hence I think this is a bug? I'm not sure how we can fix this unless we re-introduce the old tombstone behaviour or implement https://github.com/microsoft/CCF/issues/3197 so that we can detect whether a key was deleted in a specific indexing interval? @eddyashton will be able to answer.
Edit: Because the deleted keys are also included in the corresponding ledger entry, it's probably not as complicated as what's described above (and probably unrelated to #4145) and may just involve creating a new type of kv::Store used by the indexer that doesn't hide deleted keys from get() and foreach().
We are still tracking deletes in the per-transaction data that we're using, but we throw that away when we deserialise to a kv::Store. I think the real outcome here is that re-using the kv::Store and kv::Tx types for historical queries and indexes which are only seeing the results of a single transaction isn't actually helpful. We thought it was useful to provide a consistent API, but it's actually misleading (suggests we're operating over a projected KV state rather than a single transaction's write set), and conflates different use cases (as this issue notes, deliberately abstracting deletions vs wanting to know they happened).
So we should probably be providing something like the hook API for both of these systems. We can build a stubby ReadOnlyTx on top of that to maintain the current API.
Having looked through the code a bit I think I'd plan to add this through the following changes:
- Create a new
TxDifftype that is similar to aReadOnlyTxbut wraps anuntyped::ChangeSetdirectly (or perhaps just theuntyped::Writesfrom within that, if that is all we need) - Rather than creating map handles from the
TxDifftype apps would be able to get aMapDifftype out which provided a rudimentary interface (at least for now) to get untyped things and maybe a similar typed wrapper (tx_diff.get_map<MapDiff<K, V, KSer, VSer>>()or something) - The index
Strategywould have another function (but with a default implementation doing nothing) for working directly with theTxDiff:handle_committed_diff(naming can change) which takes theTxIDand the newTxDifffor the user to work with.
It would be nice to have the old handle_committed_transaction be callable from the default implementation of the new handle_committed_diff if we can build a store from it (or secretly keep it inside for now). This would avoid the need to update too many things around building up the Tx etc.
@eddyashton @achamayou what do you think?
@jeffa5 sounds reasonable to me, is it fair to characterize the overhead for users who only have handle_committed_transaction as minimal?
Yes, I think it should be ok to do it so that they only pay for it when they ask, otherwise we basically continue with similar logic to now.