librustzcash introduce ability to cancel `scan_cached

pub fn scan_cached_blocks<ParamsT, DbT, BlockSourceT>(
    params: &ParamsT,
    block_source: &BlockSourceT,
    data_db: &mut DbT,
    limit: Option<u32>,
) -> Result<(), Error<DbT::Error, BlockSourceT::Error, DbT::NoteRef>>

Problem:

scanning a full block can take many seconds on mobile (some times up to 30 seconds) and can't be stopped by callers if needed.

Mobile applications must be able to cancel ongoing work when the operating system requests to or be terminated (SIG_ABORT or SIG_KILL sent from OS). This has many side effects.

When Running background tasks

Comment from Michal Fousek:

I am working on those long running background tasks. And I made it work. I am just not sure if it’s usable in our case. And maybe it would be better to suggest to developers to not do this. Problem is that execution of background tasks is up to system completely. You don’t know when and if the task will run. And it runs just for a few minutes. Which is not a much time. And when it expires there isn’t much time to stop the work. And stopping of SDK can take lot of time sometimes. Like when SDK is scanning block we are not able to interrupt that scanning. I am not sure if the few minutes in background is worth the troubles that can be caused by unexpected states.

Stopping the scanning process at will

Comment from Michal Fousek:

And about the short and simple background task - https://github.com/zcash/ZcashLightClientKit/pull/626. In that PR I implemented just stopping of the SDK when app goes to background. Here you mentioned that scan of block can take up to 24 seconds. Now batch size is 5 above 1_600_000 block height. Which means that one scan call in rust can take up to 120 seconds before we are able to interrupt it. iOS won’t probably give us that long for a background task. And SDK isn’t currently great at resuming of sync process. So the question I am asking if we shouldn’t do any background work at all at this point. And simply let app suspend. It will be easier to implement for developers. And maybe it will result in less troubles.

Proposed solution

Make scan_cached_blocks cancellable through the FFI. So that callers can halt scanning work on the spot and there's no more delays than a few milliseconds.

Nov 17 '22 12:11 pacu

Part of the intent of the DAGSync design is to enable precisely this kind of cancellation logic. Futures do nothing unless polled, so we can control the entire DAGSync engine by controlling its runtime; shut down the runtime, and everything else stops. The mobile SDK would call some Rust function, that starts its own thread for the runtime and returns a handle to it. The handle would then let the mobile SDK stop the reactor at any time, and any Rust-created background worker threads doing e.g. trial decryption can just be abandoned (and they will either finish and drop their results, or get killed).

scan_cached_blocks is not designed to be cancellable: it is a synchronous function intended to be run in a blocking thread. The introduction of a limit parameter enabled its runtime to be bounded, but as your comments above indicate, limit is not linear in wall-clock time.

Now, we can potentially modify scan_cached_blocks to enable something similar to what we plan to do with DAGSync, but it would require scan_cached_blocks to be used very differently by mobile SDKs (to the point where it is almost a different function). Currently the workflow is the following in a single thread (Android code, iOS code):

While there are blocks left to scan:
- Call scan_cached_blocks to scan up to limit blocks.
- Update the UI with progress.

With cancellation, mobile SDKs would instead need to do the following multi-threaded process:

While there are blocks left to scan:
- Call some new Rust API to get a "cancellation channel" (send/recv pair).
- Create a new thread that:
  - Calls scan_cached_blocks with the recv side of the channel and limit.
  - Notifies the parent thread that it is complete.
- Wait for either the child thread to notify that it is complete, or something to happen that requires cancellation.
- If cancellation is required, call a new Rust API with the send side of the channel to invoke cancellation.
- Wait for the child thread to finish.
- Update the UI with progress or in response to cancellation.

All of the above is just the new logic that would need to be implemented in both the Android and iOS mobile SDKs. The two new Rust APIs are relatively straightforward, but we would additionally need to thread the cancellation logic through every point in the scanning code where we want to be able to abort. And even then, there will be some boundary that we cannot cancel through (as there are minimum units of work being done).

Given all of the above, I think that instead of introducing the ability to cancel scan_cached_blocks, it would make more sense to move the mobile SDKs towards the scanning interface that DAGSync will expose, and have that interface internally call scan_cached_blocks with the above logic. This would move the multi-threaded process out of Android / iOS and into Rust (halving the amount of implementation work), and reduce the amount of work needed to move from this to full DAGSync (more precisely, it is front-loading some of that mobile SDK work in a way that can also be used for DAGSync).

In any case, this issue does imply doing work that would be thrown away once we have DAGSync, but as long as it isn't too much work then the trade-offs probably make sense (in that it helps the mobile UX in the short term).

Dec 09 '22 05:12 str4d

I'm curious how @str4d's response applies today, a year later. I think DAGSync has been implemented by now, and I'm also interested in cancellation.

Feb 16 '24 04:02 AArnott

librustzcash
librustzcash copied to clipboard

introduce ability to cancel `scan_cached_blocks`

Problem:

When Running background tasks

Stopping the scanning process at will

Proposed solution

librustzcash librustzcash copied to clipboard

introduce ability to cancel `scan_cached_blocks`

Problem:

When Running background tasks

Stopping the scanning process at will

Proposed solution

librustzcash
librustzcash copied to clipboard