openmmtools icon indicating copy to clipboard operation
openmmtools copied to clipboard

Make a shallow copy of the serialized dict in utils.deserialize before popping the module name and class

Open dwhswenson opened this issue 3 years ago • 4 comments

If I do "1.5 cycles" of serialization (i.e., serialize, deserialize the resulting dict, reserialize the result of that) I would expect that the dict for the originally serialized object would be identical to the dict for the reserialized object. (This is what I usually test to check serialization.)

This is not the case with OpenMMTools objects, because the attributes _serialized__module_name and _serialized__class_name are injected (presumably during deserialization). This seems like a bug to me, although I suppose it might be intended as a way to mark objects as having come from serialization.

In [1]: from openmmtools import mcmc
   ...: from openmmtools.utils import serialize, deserialize

In [2]: move = mcmc.MCDisplacementMove()

In [3]: serialized = serialize(move)
   ...: deserialized = deserialize(serialized)
   ...: reserialized = serialize(deserialized)

In [4]: serialized == reserialized
Out[4]: False

In [5]: serialized
Out[5]:
{'atom_subset': None,
 'context_cache': None,
 'n_accepted': 0,
 'n_proposed': 0,
 'displacement_sigma': Quantity(value=1.0, unit=nanometer)}

In [6]: reserialized
Out[6]:
{'atom_subset': None,
 'context_cache': None,
 'n_accepted': 0,
 'n_proposed': 0,
 'displacement_sigma': Quantity(value=1.0, unit=nanometer),
 '_serialized__module_name': 'openmmtools.mcmc',
 '_serialized__class_name': 'MCDisplacementMove'}

EDIT: version info

In [1]: import openmmtools

In [2]: openmmtools.__git_revision__
Out[2]: '5b46695dc8e666bba65d9c134bf926ba08d8d6bc'

dwhswenson avatar Jul 31 '21 18:07 dwhswenson

Hi @dwhswenson . It's weird that the first serialization doesn't have the serialized_X attributes since serialize() should always add them (see here). Even weirder that deserialize() doesn't raise an exception since it cannot restore objects without the serialized_X attribute (see here).

andrrizzi avatar Aug 02 '21 06:08 andrrizzi

Ah! I see what's happening. deserialize() actually pop()s the serialized_X attributes so you don't see them anymore in the serialized variable if you print it after calling deserialize() on it.

If you need the serialization object to be preserved, we could add a shallow copy of the dictionary on top of deserialize().

andrrizzi avatar Aug 02 '21 06:08 andrrizzi

Ah! I see what's happening. deserialize() actually pop()s the serialized_X attributes so you don't see them anymore in the serialized variable if you print it after calling deserialize() on it.

Of course! I should have guessed that. When I looked again my code example, I wondered how deserialization worked in the first place.

I can add the shallow copy on my end before deserialization (so I'm closing this as solvable on my side). I'm wrapping openmmtools.mcmc as an engine for OPS, so this was important for the serialization tests on our end. However, it might still be worth the copy on your end to avoid other confusion.

(I'll post another issue soon looking for some advice on best approaches for that engine -- I think I already have the proof of concept, but there were some aspects where your input would be really helpful.)

dwhswenson avatar Aug 02 '21 09:08 dwhswenson

However, it might still be worth the copy on your end to avoid other confusion.

I agree. I'll reopen this to keep track. Also, feel free to open a PR with the fix here rather than modifying your code if you have the time.

andrrizzi avatar Aug 04 '21 07:08 andrrizzi