flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-33192] Fix Memory Leak in WindowOperator due to Improper Timer Cleanup

Open kartikeypant opened this issue 1 year ago • 2 comments

What is the purpose of the change

  • This Pull Request addresses an important state memory leak issue identified within the default window operator of Apache Flink.
  • After this change, the cleanup timer should be registered for every window that's added to the window state regardless of it emitting a result after it’s fired.
  • This change is associated to the following bug: FLINK-33192

Brief change log

org.apache.flink.streaming.runtime.operators.windowing.WindowOperator

Verifying this change

  • WindowOperatorTest is working for functional correctness.
  • Memory Leak Prevention Validation: Due to the nature of this change aimed at preventing memory leaks, conducting validation via a unit test presents challenges.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

kartikeypant avatar Jun 08 '24 07:06 kartikeypant

CI report:

  • d745169b106e3d20a90a744678a7ecbede5fd55d Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jun 08 '24 07:06 flinkbot

@kartikeypant Thanks for the PR, overall LGTM, would you like to add a test about this in WindowOperatorTest?

fredia avatar Jun 14 '24 06:06 fredia

Hey @fredia, thank you for your feedback!

I have added the test testCleanupTimerWithEmptyStateNoResultForTumblingWindows (as specified) in WindowOperatorTest and updated the description of the PR with the same.

Could you please review the changes and let me know if any further adjustments are needed?

kartikeypant avatar Jul 02 '24 08:07 kartikeypant

Sure, thanks @fredia for the guidance - will back-port these changes to v1.18/1.19/1.20 as well as suggested.

kartikeypant avatar Jul 05 '24 08:07 kartikeypant

@flinkbot run azure

fredia avatar Jul 05 '24 10:07 fredia