motoko icon indicating copy to clipboard operation
motoko copied to clipboard

Best effort messages not making it to their destination?

Open ByronBecker opened this issue 6 months ago • 14 comments

@crusso @ggreif

I recent updated some code to use best effort messages, and it seems that when not explicitly awaited, a best effort message does not succeed.

This is the original function that worked (not using best effort messages).

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
    try {
      ignore historyTracker.track(canisterId);
    } catch (e) {
      print("error tracking canister " # debug_show (canisterId) # " Error: " # Error.message(e));
    };
  };

I recently updated the code to use this best effort messages instead (with a 20 second timeout, which should be plenty), while keeping the same ignore pattern.

This is the version of the function that is not working.

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
    try {
      ignore (with timeout = 20) historyTracker.track(canisterId);
    } catch (e) {
      print("error tracking canister " # debug_show (canisterId) # " Error: " # Error.message(e));
    };
  };

However, I've noticed recently that these calls to .track() are being dropped and not making it to their intended destination.

When I change the code to fully await the best effort message, it works.

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
    try {
      let result = await (with timeout = 20) historyTracker.track(canisterId);
      print("Successfully tracked canister " # debug_show (canisterId) # " Result: " # debug_show (result));
    } catch (e) {
      print("error tracking canister " # debug_show (canisterId) # " Error: " # Error.message(e));
    };
  };

The issue here though is that I want to use best effort messages for security purposes, but I don't want to hold up execution while the best effort message is sent, and instead to treat the call as a side effect.

ByronBecker avatar Jun 20 '25 01:06 ByronBecker

These use the same delivery flow in the replica as unbounded-wait messages, so there should be no difference. I'll try a repro!

EDIT: We have extensive tests for the deadline in test/run-drun/par.mo, which should cover your case. Maybe you have a quick look?

ggreif avatar Jun 20 '25 07:06 ggreif

I see a small problem (maybe due to the flag register I use and don't reset?)

$ diff -u test/run-drun/ok/non-par.drun-run.ok test/run-drun/ok/par.drun-run.ok
--- test/run-drun/ok/non-par.drun-run.ok	2025-05-21 15:56:43.330896000 +0200
+++ test/run-drun/ok/par.drun-run.ok	2025-05-21 15:57:10.486431000 +0200
@@ -6,9 +6,9 @@
 debug.print: test2()
 ingress Completed: Reply: 0x4449444c0000
 debug.print: test3()
-debug.print: oneshot deadline set: true
+debug.print: oneshot deadline set: false
 debug.print: oneshot cycles: 0
-debug.print: oneshot deadline set: true
+debug.print: oneshot deadline set: false
 debug.print: oneshot cycles: 3_456
 debug.print: oneshot deadline set: true
 debug.print: oneshot cycles: 0
@@ -26,3 +26,15 @@
 debug.print: test7()
 debug.print: CAUGHT: #system_unknown "Call deadline has expired."
 ingress Completed: Reply: 0x4449444c0000
+debug.print: test8()
+debug.print: quux: 98_765
+ingress Completed: Reply: 0x4449444c0000
+debug.print: test9()
+debug.print: localOuter: 876
+debug.print: localOuter: 0
+debug.print: local: 1_029
+debug.print: local: 42
+ingress Completed: Reply: 0x4449444c0000
+debug.print: test10()
+debug.print: anon: 34_567
+ingress Completed: Reply: 0x4449444c0000

Update: I have added some debug logs

$ git diff
diff --git a/src/prelude/internals.mo b/src/prelude/internals.mo
index 385a8dd74..6c4b58c50 100644
--- a/src/prelude/internals.mo
+++ b/src/prelude/internals.mo
@@ -21,7 +21,7 @@ func @add_cycles<system>() {
     (prim "cyclesAdd" : Nat -> ()) cycles;
   };
   switch @timeout {
-    case (?timeout) { @timeout := null; (prim "timeoutSet" : Nat32 -> ()) timeout };
+  case (?timeout) { (prim "print" : Text -> ())("timeoutSet: " # debug_show timeout); @timeout := null; (prim "timeoutSet" : Nat32 -> ()) timeout };
     case null ()
   }
 };

and I see for test3 in par.mo

 debug.print: test3()
+debug.print: timeoutSet: 5
+debug.print: timeoutSet: 5
 debug.print: oneshot deadline set: false
 debug.print: oneshot cycles: 0
 debug.print: oneshot deadline set: false
    public func test3() : async () {
        debugPrint "test3()";
        oneshot();
        (with cycles = 3456) oneshot();
        (with timeout = 5) oneshot();
        (with timeout = 5; cycles = 4567) A.oneshot();
    };

versus in non-par.mo

 debug.print: test3()
+debug.print: timeoutSet: 5
 debug.print: oneshot deadline set: true
 debug.print: oneshot cycles: 0
 debug.print: oneshot deadline set: true
    public func test3() : async () {
        debugPrint "test3()";
        oneshot();
        Cycles.add<system> 3456; oneshot();
        (with timeout = 5) oneshot();
        Cycles.add<system> 4567; A.oneshot();
    };
$ grep "oneshot deadline set" test/run-drun/ok/non-par.drun-run.ok
debug.print: oneshot deadline set: true
debug.print: oneshot deadline set: true
debug.print: oneshot deadline set: true <----- this one is right, the others not!
debug.print: oneshot deadline set: true

Why does it have a deadline in all of them? There should only be one true! Debugging...

UPDATE: Gotcha!

diff --git a/test/run-drun/non-par.mo b/test/run-drun/non-par.mo
index d03f7624f..ccfe82fe3 100644
--- a/test/run-drun/non-par.mo
+++ b/test/run-drun/non-par.mo
@@ -1,4 +1,4 @@
-import { call_raw; debugPrint; principalOfActor; errorMessage; errorCode } = "mo:
⛔";
+import { call_raw; debugPrint; principalOfActor; replyDeadline; errorMessage; errorCode } = "mo:⛔";
 import Cycles = "cycles/cycles";
 // This is a copy of `par.mo`, but with legacy `Cycles.add`. It should output equivalent results.
 
@@ -20,7 +20,7 @@ actor A {
 
 
     public func oneshot() {
-        debugPrint ("oneshot deadline set: " # debug_show true); // faking
+        debugPrint ("oneshot deadline set: " # debug_show (0 != replyDeadline()));
         debugPrint ("oneshot cycles: " # debug_show(Cycles.available()));
     };
 

My fault: https://github.com/dfinity/motoko/pull/5357

ggreif avatar Jun 20 '25 08:06 ggreif

Hey @ggreif just checking in on this. We reverted our code to not use best effort messages for the time being, but I'm curious what you discovered on your end.

ByronBecker avatar Jul 09 '25 18:07 ByronBecker

what you discovered on your end

Sorry, no progress in this regard yet.

ggreif avatar Jul 09 '25 18:07 ggreif

Let's keep this in mind: dfinity/ic#5902

ggreif avatar Jul 11 '25 11:07 ggreif

Hi @ByronBecker, I tried to reproduce this with a minimal example, but I couldn't.

import Iter "mo:base/Iter";

actor {
  var nr_pings : Nat = 0;
  public shared func make_pings() : async () {
    for (i in Iter.range(0, 5)) {
      ignore (with timeout = 10) ping();
    }
  };

  public shared func reset_counter() : async () {
    nr_pings := 0;
  };

  public shared func ping() : async () {
    nr_pings += 1;
  };

  public func get_counter() : async Nat {
    return nr_pings;
  }; 
};

(sorry for the terrible Motoko) And then:

% dfx canister call ignore-bounded-backend get_counter
(0 : nat)
% dfx canister call ignore-bounded-backend make_pings 
()
% dfx canister call ignore-bounded-backend get_counter
(6 : nat)

This seems to be working as expected, without any difference in whether I await or ignore the result. Could you help me get to a working minimal counterexample? Maybe things have changed since June, I'm on dfx 0.28.0 which uses moc 0.14.13

oggy-dfin avatar Aug 20 '25 09:08 oggy-dfin

In this version of the code, the catch would never be entered, because you aren't awaiting any of the track promises. So even if one of them times out or fails with another error, you won't see the error.

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
    try {
      ignore historyTracker.track(canisterId);
    } catch (e) {
      print("error tracking canister " # debug_show (canisterId) # " Error: " # Error.message(e));
    };
  };

How are you calling trackCanister? If there's no await or commit point after the call, the messages will be queued, but never sent.

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
     ignore historyTracker.track(canisterId);
     await async {}; // send the track messages here
  };

would ensure the messages are at least sent (but give you no indication of the timeout).

crusso avatar Aug 20 '25 11:08 crusso

@Gabor, do you reset the timeout register? I think you should, otherwise the first message that uses a timeout will cause all others to use one.

crusso avatar Aug 20 '25 11:08 crusso

@gabor, do you reset the timeout register? I think you should, otherwise the first message that uses a timeout will cause all others to use one.

Sure I do:

var @timeout : ?Nat32 = null;

// Function called by backend to add funds to call.
// DO NOT RENAME without modifying compilation.
func @add_cycles<system>() {
  let cycles = @cycles;
  @reset_cycles();
  if (cycles != 0) {
    (prim "cyclesAdd" : Nat -> ()) cycles;
  };
  switch @timeout {
    case (?timeout) { @timeout := null; (prim "timeoutSet" : Nat32 -> ()) timeout };
    case null ()
  }
};

ggreif avatar Aug 20 '25 14:08 ggreif

How are you calling trackCanister? If there's no await or commit point after the call, the messages will be queued, but never sent.

@crusso I'm calling it from a public API with ignore HistoryTrackerService.trackCanister(historyTracker, args.canisterId);

The internal trackCanister function also has an ignore, so in that sense the ignore is "nested".

ByronBecker avatar Aug 20 '25 15:08 ByronBecker

What is return type of trackCanister, async or async*? If async*, the ignore has not effect at all (no message is sent)

crusso avatar Aug 20 '25 15:08 crusso

What is return type of trackCanister, async or async*? If async*, the ignore has not effect at all (no message is sent)

The return type is async* ()

ByronBecker avatar Aug 20 '25 15:08 ByronBecker

Then how come this version type checks? It's using an await, not await*? Or did you also change the type of .track in this modified version?

public func trackCanister(
    historyTracker : Interface.Service,
    canisterId : Principal,
  ) : async* () {
    try {
      let result = await (with timeout = 20) historyTracker.track(canisterId);  // <- this should be await* if track returns async*
      print("Successfully tracked canister " # debug_show (canisterId) # " Result: " # debug_show (result));
    } catch (e) {
      print("error tracking canister " # debug_show (canisterId) # " Error: " # Error.message(e));
    };
  };

crusso avatar Aug 20 '25 15:08 crusso

@ByronBecker JFYI, #5419 will warn you if you try to ignore an async* computation. It will also warn if it is not the only reference to the computation, like in

let cmp = async* { ... };
ignore cmp;

but this is quite unusual (and pointless) code.

ggreif avatar Aug 22 '25 22:08 ggreif