restate icon indicating copy to clipboard operation
restate copied to clipboard

send with execution delay only responds with invocation id after delay

Open tillrohrmann opened this issue 1 year ago • 3 comments

When invoking a service in fire-and-forget style via send and with an execution delay restate:8080/MyService/myHandler/send?delay=1month, then it seems that the client only receives the invocation id after the execution delay. If possible, we should get the invocation id immediately.

tillrohrmann avatar Jul 01 '24 10:07 tillrohrmann

In order to trigger this, you also sent the request with idempotency id i assume?

slinkydeveloper avatar Jul 12 '24 14:07 slinkydeveloper

I investigated this, it needs a bit of internal refactoring of invocations in scheduled state. We need to introduce InvocationStatus::Scheduled, which comes before Inboxed as state transition.

slinkydeveloper avatar Jul 12 '24 16:07 slinkydeveloper

I investigated this, it needs a bit of internal refactoring of invocations in scheduled state. We need to introduce InvocationStatus::Scheduled, which comes before Inboxed as state transition.

Thanks for the investigation @slinkydeveloper. I think this makes sense to me. Introducing this change could be a good exercise to ensure backwards compatibility and being able to roll back. I guess we would need to introduce this change first guarded by a feature flag so that the runtime does not use newly introduced fields but can understand them. In the version after, we can then enable this feature by default.

tillrohrmann avatar Jul 13 '24 16:07 tillrohrmann

Summary of the offline discussion:

  • It makes sense to introduce a variant InvocationStatus::Scheduled, this has also some side benefits/fixes that we get for free (be able to cancel/kill scheduled invocations, list them with datafusion).
  • We also would like to take the opportunity to change the way we lay in data in the protobuf message, to something like this:
message Invocation {

   enum Status {
        UNKNOWN_STATUS = 0;
        INVOKED = 1;
        SUSPENDED = 2;
        SCHEDULED = 3;
        // [...]
    }

     // Common fields
     Status status = 1;
     InvocationTarget invocation_target = 2;
     // [...]

     // Specific to variant fields
     JournalMeta journal_meta = 3;
     // [...]
}

This is a tad bit better for forward/backward compat because it allows us to read some fields in case we don't recognize the variant (e.g. for better error reporting). It might also make more efficient scans where we don't need to read the full invocation status, but just the invocation target (e.g. on leader election).

  • In order to do so, we introduce a new table Invocation which substitutes the old invocation table.
  • On the next release 1.1 we add the ability to read from this table. If the config option worker.use_new_invocation_table (or smth like that) is true, then we also write to it. Test tools will test using this new config option enabled. Similarly, we add a variant to the timers for Invoke based on the new table. The implication of this is that in the partition processor state machine we'll need to know read/write the new table at every point. I might be able to hide some of this within the InvocationStatus struct, but still needs to leak to the state machine business logic.
  • On release 1.2, we add a migration script that automatically runs on boot to completely migrate all the invocations to the new table.

slinkydeveloper avatar Aug 13 '24 10:08 slinkydeveloper

  • On release 1.2, we add a migration script that automatically runs on boot to completely migrate all the invocations to the new table.

The migration step is probably optional if we want to remove the old invocation status table in version 1.3 and want to be sure that there are no old invocation status entries left, right?

tillrohrmann avatar Aug 14 '24 08:08 tillrohrmann

The migration step is probably optional if we want to remove the old invocation status table in version 1.3 and want to be sure that there are no old invocation status entries left, right?

Yes. It mostly benefits our code organization.

slinkydeveloper avatar Aug 14 '24 10:08 slinkydeveloper