flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-34251][core] ClosureCleaner to include reference classes for non-serialization exception

Open liuml07 opened this issue 5 months ago • 3 comments

This is a revised version of the previous (auto-)closed PR #24205 More discussions in the JIRA: https://issues.apache.org/jira/browse/FLINK-34251

What is the purpose of the change

Currently the ClosureCleaner throws exception if {{checkSerializable} is enabled while some object is non-serializable. It includes the non-serializable (nested) object in the exception in the exception message.

However, when the user job program gets more complex pulling multiple operators each of which pulls multiple 3rd party libraries, it is unclear how the non-serializable object is referenced as some of those objects could be nested in multiple levels. For example, following exception is not straightforward where to check:

org.apache.flink.api.common.InvalidProgramException: java.lang.Object@528c868 is not serializable. 

It would be nice to include the reference stack in the exception message, as following:

org.apache.flink.api.common.InvalidProgramException: java.lang.Object@72437d8d is not serializable.
Referenced via [com.mycompany.myapp.ComplexMap -> com.mycompany.myapp.LocalMap -> 
com.yourcompany.yourapp.YourPojo -> com.hercompany.herapp.Random -> java.lang.Object]

Verifying this change

This change is largely covered by existing tests, and new test case was added to ClosureCleanerTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

liuml07 avatar Jul 09 '25 23:07 liuml07

Could you review this, @davidradl ? Thanks

liuml07 avatar Jul 09 '25 23:07 liuml07

CI report:

  • 9bdfc072fbf0f1f21e44a33e86615dcf9982ae8a Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jul 09 '25 23:07 flinkbot

@gyfora if I recall correctly we ran into exactly the issue addressed by this PR in the Operator code. Could you please take a look?

afedulov avatar Jul 22 '25 09:07 afedulov