pickling icon indicating copy to clipboard operation
pickling copied to clipboard

How to cope with changes to object definition between pickle and unpickle

Open DavidGoodenough opened this issue 9 years ago • 3 comments

I may have missed something, but reading the docs for picking I have not seen any comments about what to do if I update the classes that define the pickled data. So if I am running an application with version 1 of a data class, pickle the data and store it in a file, load a new version of the JAR containing the classes and try to unpickle the data, what will happen.

Presumably if none of the data fields (as opposed to methods) have changed then all should be well, but how does the unpickling process cope with either old fields which no longer exist, or new ones that have been created. Presumeably new fields are not a problem if they have default values, and I could always leave old fields and just stop using them. But what I do not want to happen is for the unpickle to fail.

David

DavidGoodenough avatar Oct 27 '15 11:10 DavidGoodenough

This is a great question. Migration of data is VERY hard. for sbt-serialization, we actually added special handling for Option[T] data types in the picklers. It allows for fields to no longer be present, but not fail the pickler. the mechanism here, then, is to use Option[] for things which are added (or may not be present), allowing old values to be serialzied and new values to also serialize.

Generally though, protocol compatibility is considered outside the realm of picklers themselves. You can, e.g. create a custom pickle container that has a protocol version and this will be used to select the types you pickle/unpickle. I recommend, highly, taking time to think through how your project needs to handle serialization compatiblity and generate an extensive test suite to ensure it is not broken. If you'd like an example, see:

https://github.com/sbt/sbt-remote-control/blob/master/protocol-test/src/test/scala/sbt/protocol/ProtocolSpec.scala

jsuereth avatar Nov 18 '15 17:11 jsuereth

My need is to pickle data for storage in a DB and yes, I want to store objects as blobs, I do not need or want to use a SQL DB and store the data as fields. I then need to be able to unpickle it at a later time when the code may have been updated. Thus I need to be able to tollerate both additions and deletions of fields, and possibly type changes.

DavidGoodenough avatar Nov 19 '15 22:11 DavidGoodenough

David, I would look at Avro, or some other serialization mechanism that explicitly supports schema lifecycle.

jeffrey-aguilera avatar Dec 18 '15 07:12 jeffrey-aguilera