storm icon indicating copy to clipboard operation
storm copied to clipboard

[STORM-3566] add serialVersionUID field to class which implement Serializable

Open howardliu-cn opened this issue 5 years ago • 8 comments

[STORM-3566]add serialVersionUID field to class which implement Serializable interface.

howardliu-cn avatar Jan 14 '20 05:01 howardliu-cn

Thanks for your contribution. Please file a JIRA and then update this PR title and commit message to prefix with the JIRA number.

Ethanlm avatar Jan 14 '20 17:01 Ethanlm

Thanks for your contribution. Please file a JIRA and then update this PR title and commit message to prefix with the JIRA number.

I have file a JIRA. The link is https://issues.apache.org/jira/browse/STORM-3566.

howardliu-cn avatar Jan 15 '20 12:01 howardliu-cn

Please also update your commit message to prefix with the JIRA number. Thanks. Other than that, I am +1

Ethanlm avatar Jan 15 '20 22:01 Ethanlm

Sorry I have a second thought on this. Although serialVersionUID is recommended, we should not add it blindly.

serialVersionUID should be added where there is a need for backwards compatibility or interoperability.

For example, the user defined Bolt/Spout. User defines their own Bolt/Spout and submits the topology over the network to Nimbus. Eventually supervisors will get the assignment and deserialize bolt/spout and put it to work. serialVersionUID is recommended in this case because user might submit their topology from a JVM different than where the supervisor is running on (e.g. Oracle vs IBM java). In this case, if serialVersionUID is not set, a UID will be generated (say 1234L) and stored in the serialized object. When supervisor tries to deserialize it, because it's running on different JVM, it could generate a different UID for this object (say 3456L), then an InvalidClassException would be thrown. Hence we want to declare serialVersionUID explicitly to avoid this.

But examples like EventLoggerBolt, it's only accessed by nimbus or supervisor itself. Not declaring serialVersionUID should be just fine. It's almost definitely devs will forget to update serialVersionUID manually on the code they changes. Declare serialVersionUID explicitly might cause unnecessary trouble.

All I am trying to say here is we should think about this carefully and do it case by case. I would like to hear some inputs from @srdo @kishorvpatil too. I feel like this is a more complicated issue than I originally thought.

Some useful links:

  1. https://www.vojtechruzicka.com/explicitly-declare-serialversionuid/
  2. https://community.sonarsource.com/t/serializable-classes-should-use-auto-generated-version-ids/1217

Thanks for your work on this.

Ethanlm avatar Jan 23 '20 23:01 Ethanlm

@Ethanlm I'm not sure.

I think we should probably add the field for all bolts and classes serialized as part of bolt state, since these classes are likely to be serialized by the submitter JVM, and then deserialized by the supervisor/worker.

For other classes, we should probably look at whether they get serialized by one JVM, and then deserialized by another. If they are, we should add the field, and will then have to remember to change the value if we add/change fields in PRs.

Regarding whether we remember to update the field, I think there might be a compatibility issue in the current code. As it is now, I think we can't change state in any of the serializable classes that are not packaged in topology jars without breaking user code due to changed UID. This would for example include the bolts in storm-client. I think if we change a class there, we would run into issues with rolling upgrades, since e.g. Nimbus might be running 2.1.0, and the worker might be running 2.0.1, and the version of the class that is loaded depends on the Storm version instead of depending on the contents of the topology jar.

For classes that get packaged as part of the topology jar (e.g. any bolts in the externals or examples), we should be fine.

I'm not sure how to solve the compatibility issue (assuming it exists), without moving the serializable classes out of storm-client, and into another module that we then require topology jars to package.

srdo avatar Jan 24 '20 11:01 srdo

@srdo Thanks. I agree with you

Ethanlm avatar Jan 27 '20 15:01 Ethanlm

@srdo @Ethanlm Thanks. Would you approve this PR? Or, Do I need to do some change?

howardliu-cn avatar Feb 09 '20 11:02 howardliu-cn

@howardliu-cn I think we need to distinguish what classes should be added with serialVersionUID and what shouldn't because of the reasons we discussed above. The decision needs to be made case by case. serialVersionUID shouldn't be added blindly.

Ethanlm avatar Feb 10 '20 15:02 Ethanlm