flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-35412][State/Runtime] Batch execution of async state request callback

Open Zakelly opened this issue 1 year ago • 2 comments

What is the purpose of the change

Currently, when there is a state request finished, a callback will be wrapped into a mail and inserted into the Mailbox. This PR put a batch of consecutive callbacks into one mail and execute them in batch, which reduce the interaction with the queue of mailbox. Additionally, the new introduced runner propagate the flag hasMail, which is essential for fine-grained scheduling of AsyncExecutionController (will exposed in later PR).

Brief change log

  • Introduce BatchCallbackRunner that receives single callbacks and run them in batch.

Verifying this change

This change is already covered by existing tests on AEC.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Zakelly avatar May 23 '24 10:05 Zakelly

CI report:

  • 54986860da05706ee459dff5338b60924d7ca523 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar May 23 '24 10:05 flinkbot

@jectpro7 Let me elaborate this a little more. This runner aims for encapsulating multiple callbacks into one mail, but still it runs the callback ASAP. There will be no waiting or gathering, and small batches are totally acceptable. We already have buffered the state requests in AEC, and I don't want to introduce more latency here. This is only a small optimization to reduce the mail and interaction with the queue of mail.

So the rules are:

  1. If there is any callback, there will be a mail.
  2. There is at most one mail at the same time.
  3. At each mail, the task thread will handle at most specific number of callbacks.

Hope this helps you know what I'm trying to do.

Zakelly avatar May 28 '24 10:05 Zakelly

Based on our testing, we were able to get some performance improvements with this PR. Will merge this.

Zakelly avatar Sep 19 '24 10:09 Zakelly