interface-spec icon indicating copy to clipboard operation
interface-spec copied to clipboard

feat: on low wasm memory hook

Open ielashi opened this issue 11 months ago • 12 comments

This proposal is based on this forum post and has already been approved by motion proposal 106146.

Canister developers have to actively monitor their canisters to know if they are low on wasm memory. If detected too late, a canister can be completely stuck and forever un-upgradable.

To address this, we introduce a hook called on_low_wasm_memory. When defined, it is triggered whenever the canister's memory usage exceeds some developer-defined threshold.

ielashi avatar Mar 22 '24 14:03 ielashi

🤖 Here's your preview: https://oov64-qqaaa-aaaak-qcnsa-cai.icp0.io/docs

github-actions[bot] avatar Mar 22 '24 14:03 github-actions[bot]

@mraszyk I didn't make any changes to the formal model in this PR yet, as I first wanted to get your feedback that the proposal as-is looks good.

ielashi avatar Mar 22 '24 14:03 ielashi

@mraszyk I didn't make any changes to the formal model in this PR yet, as I first wanted to get your feedback that the proposal as-is looks good.

@ielashi @mraszyk I assume that we're fine with the current status of this PR so can the formal model be updated so we can move this to "FINAL" and be ready to start implementation at our convenience?

dsarlis avatar May 13 '24 08:05 dsarlis

@mraszyk Is the formal model something that you'd update yourself or should we look into that on our side?

ielashi avatar May 13 '24 16:05 ielashi

I assume that we're fine with the current status of this PR

How about the call_on_cleanup analogy that I raised in the unresolved comment above?

mraszyk avatar May 13 '24 19:05 mraszyk

I assume that we're fine with the current status of this PR

How about the call_on_cleanup analogy that I raised in the unresolved comment above?

@mraszyk Doesn't this answer your question?

ielashi avatar May 14 '24 11:05 ielashi

@ielashi I replied

The reason for that is that call_on_cleanup, unlike this hook, cannot make outbound calls.

As long as it can schedule a one-off timer, I don't see a major issue with this limitation.

on that thread and you put thumbs up so I thought you saw that.

mraszyk avatar May 14 '24 12:05 mraszyk

@ielashi I replied

The reason for that is that call_on_cleanup, unlike this hook, cannot make outbound calls.

As long as it can schedule a one-off timer, I don't see a major issue with this limitation.

on that thread and you put thumbs up so I thought you saw that.

I did, and my impression was the issue is resolved, or did I misunderstand?

ielashi avatar May 15 '24 07:05 ielashi

We discussed this PR in yesterday's spec meeting together with @dsarlis and we concluded that it'd make more sense to guarantee that enough cycles are reserved for the hook and that the hook runs as the first message after the wasm memory limit threshold is crossed and then it's fine to treat it as a system task (rather than, e.g., call_on_cleanup).

mraszyk avatar May 15 '24 12:05 mraszyk

@mraszyk @ielashi @dsarlis Question 1: WDYT when this hook should be executed:

  1. In every round canister is scheduled - this approach seems non-optimal because when we do not have anything to execute there is no risk of lack of memory
  2. In every round canister executes a task or message - this approach is the safest since we will ensure that canister has enough memory before every execution, but we will have a bigger overhead when we execute hook with for example every heartbeat
  3. In every round canister executes a message - this approach seems like reasonable tradeoff, but I am not sure that decreasing overhead from previous point is worth the risk

Question 2: If canister executes multiple messages (and/or tasks) in a single round should the hook be invoked before every execution?

dragoljub-duric avatar Jul 02 '24 13:07 dragoljub-duric

As described above, we suggest keeping the old semantics (specifying a limit on allocated heap memory rather than "remaining" memory), in which case the scheduling is also much clearer (just execute once when an execution of the canister exceeds the limit, immediately after the execution that causes the hook to run).

Dfinity-Bjoern avatar Jul 02 '24 13:07 Dfinity-Bjoern

@Dfinity-Bjoern @mraszyk If we start keeping information about the hook as {"Condition is not satisfied", "Executed", "Ready to be executed"}, and based on that we use to determine if we will run hook, do you think we should persist that information in snapshots? Screenshot 2024-07-30 at 15 15 20

dragoljub-duric avatar Jul 30 '24 13:07 dragoljub-duric