[FLINK-34251][core] ClosureCleaner to include reference classes for non-serialization exception
This is a revised version of the previous (auto-)closed PR #24205 More discussions in the JIRA: https://issues.apache.org/jira/browse/FLINK-34251
What is the purpose of the change
Currently the ClosureCleaner throws exception if {{checkSerializable} is enabled while some object is non-serializable. It includes the non-serializable (nested) object in the exception in the exception message.
However, when the user job program gets more complex pulling multiple operators each of which pulls multiple 3rd party libraries, it is unclear how the non-serializable object is referenced as some of those objects could be nested in multiple levels. For example, following exception is not straightforward where to check:
org.apache.flink.api.common.InvalidProgramException: java.lang.Object@528c868 is not serializable.
It would be nice to include the reference stack in the exception message, as following:
org.apache.flink.api.common.InvalidProgramException: java.lang.Object@72437d8d is not serializable.
Referenced via [com.mycompany.myapp.ComplexMap -> com.mycompany.myapp.LocalMap ->
com.yourcompany.yourapp.YourPojo -> com.hercompany.herapp.Random -> java.lang.Object]
Verifying this change
This change is largely covered by existing tests, and new test case was added to ClosureCleanerTest.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (yes / no) - The serializers: (yes / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
- The S3 file system connector: (yes / no / don't know)
Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
Could you review this, @davidradl ? Thanks
CI report:
- 9bdfc072fbf0f1f21e44a33e86615dcf9982ae8a Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@gyfora if I recall correctly we ran into exactly the issue addressed by this PR in the Operator code. Could you please take a look?