CCF icon indicating copy to clipboard operation
CCF copied to clipboard

How to detect deletes in the index building

Open jeffa5 opened this issue 3 years ago • 5 comments

Is your feature request related to a problem? Please describe.

When building an index with the handle_committed_transaction we can iterate through the writes in that transaction but I can't see a way to detect what was deleted. Without this information we can't keep our data up to date accurately.

Describe the solution you'd like

A method or field on the StorePtr that returns an iterator or list of the keys that were deleted in this transaction.

Describe alternatives you've considered

Changing the application to model deletes with a write instead of performing an actual delete. This has the downside of not being so explicit and adds extra logic to ignore the tombstones in the main request handling logic.

jeffa5 avatar Sep 26 '22 09:09 jeffa5

This was probably introduced recently, in #4145, as it was possible to identify a deleted key (negative version), hence I think this is a bug? I'm not sure how we can fix this unless we re-introduce the old tombstone behaviour or implement https://github.com/microsoft/CCF/issues/3197 so that we can detect whether a key was deleted in a specific indexing interval? @eddyashton will be able to answer.

Edit: Because the deleted keys are also included in the corresponding ledger entry, it's probably not as complicated as what's described above (and probably unrelated to #4145) and may just involve creating a new type of kv::Store used by the indexer that doesn't hide deleted keys from get() and foreach().

jumaffre avatar Sep 26 '22 09:09 jumaffre

We are still tracking deletes in the per-transaction data that we're using, but we throw that away when we deserialise to a kv::Store. I think the real outcome here is that re-using the kv::Store and kv::Tx types for historical queries and indexes which are only seeing the results of a single transaction isn't actually helpful. We thought it was useful to provide a consistent API, but it's actually misleading (suggests we're operating over a projected KV state rather than a single transaction's write set), and conflates different use cases (as this issue notes, deliberately abstracting deletions vs wanting to know they happened).

So we should probably be providing something like the hook API for both of these systems. We can build a stubby ReadOnlyTx on top of that to maintain the current API.

eddyashton avatar Sep 27 '22 10:09 eddyashton

Having looked through the code a bit I think I'd plan to add this through the following changes:

  • Create a new TxDiff type that is similar to a ReadOnlyTx but wraps an untyped::ChangeSet directly (or perhaps just the untyped::Writes from within that, if that is all we need)
  • Rather than creating map handles from the TxDiff type apps would be able to get a MapDiff type out which provided a rudimentary interface (at least for now) to get untyped things and maybe a similar typed wrapper (tx_diff.get_map<MapDiff<K, V, KSer, VSer>>() or something)
  • The index Strategy would have another function (but with a default implementation doing nothing) for working directly with the TxDiff: handle_committed_diff (naming can change) which takes the TxID and the new TxDiff for the user to work with.

It would be nice to have the old handle_committed_transaction be callable from the default implementation of the new handle_committed_diff if we can build a store from it (or secretly keep it inside for now). This would avoid the need to update too many things around building up the Tx etc.

@eddyashton @achamayou what do you think?

jeffa5 avatar Oct 14 '22 09:10 jeffa5

@jeffa5 sounds reasonable to me, is it fair to characterize the overhead for users who only have handle_committed_transaction as minimal?

achamayou avatar Oct 14 '22 10:10 achamayou

Yes, I think it should be ok to do it so that they only pay for it when they ask, otherwise we basically continue with similar logic to now.

jeffa5 avatar Oct 14 '22 10:10 jeffa5