narwhal icon indicating copy to clipboard operation
narwhal copied to clipboard

[umbrella] Finalize causal completion in read_causal lookups

Open huitseeker opened this issue 3 years ago • 2 comments

Description

We added the backend read_causal graph walk in #214 and exposed it through its endpoint in #258.

~~However, the DAG natively does not have any cross-network functionality: when we request a particular certificate digest as the start of the DAG walk, we immediately fail if this vertex is not in the DAG.~~

~~By contrast, since #128 / #183 a get_collections call will trigger a network request to fetch a missing collection. Moreover, we have functionality in the synchronizer since #214 / #240 to fetch missing certificates as well as missing collections.~~

Edit: this issue is to close the gap between the logic for delivering new certificates retrieved from the network, which is store-based, and the logic for retrieving them, which is DAG-based:

  • #230
  • #232
  • #233

huitseeker avatar May 31 '22 20:05 huitseeker

@huitseeker if I understood the bug correctly, your are saying that we don't trigger the causal completion for the starting point of the DAG walk on the read_causal endpoint, right? The thing is that we actually do, this is the call here: https://github.com/MystenLabs/narwhal/blob/2e33d54a78e6e14138a86bb301a36ec578381c3c/primary/src/grpc_server/validator.rs#L74 . Behind the scenes the block_synchronizer_handler will trigger - if needed - the causal completion of the providing ids and wait until it's done , so we can proceed further.

Now, this is something we discussed with @arun-koshy and I believe with you as well, since the synchronization of when a certificate has been delivered is happening based on the certificate_store, its possible that the certificate has appeared in the storage (hence the "handler" has seen this as causally completed) , but the certificate hasn't reached the DAG yet since its delivered via a different channel/mechanism - so we progress on the DAG walk but the certificate hasn't been delivered yet to the DAG it self. Basically by addressing the MystenLabs/narwhal#230 MystenLabs/sui#5263 MystenLabs/narwhal#233 we'll be able to tackle this problem.

Please let me know if I got this completely wrong.

akichidis avatar May 31 '22 22:05 akichidis

Ah, no, it seems you got it completely right. My bad, the trigger slipped my mind 🤦 . And indeed, MystenLabs/narwhal#230, MystenLabs/sui#5263, MystenLabs/narwhal#233 cover the concurrency issue between the point of arrival of the certificate (the store) and the point of retrieval (the dag).

So in the end, we pretty much want to scrap that issue and make it an umbrella for those 3 sub-issues.

huitseeker avatar May 31 '22 23:05 huitseeker

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Aug 08 '22 02:08 github-actions[bot]

Closing in favor of MystenLabs/sui#5263 which this now duplicates.

huitseeker avatar Aug 08 '22 13:08 huitseeker