Should `Interchange.minimize` also modify topology positions?
Description
I used Interchange.minimize() to optimize the conformation of a small molecule only. The positions were updated in the Interchange, but not in Interchange.topology. That was a bit confusing, since I wanted to write down sdf for a minimized ligand, but I got exactly the same coordinates as in input.
I thought that it can be, that not all interchanges have the topology attribute, but I breifly checked that most of the Interchange constructors also generate topology attribute. So why not adding the following line to minimize method?
self.topology.set_positions(self.positions)
In a perfect world, this information wouldn't be duplicated 😅
My first impression is in favor of this; I think when I wrote Interchange.minimize originally, Topology.set_positions didn't exist so it was more difficult at the time. I'd like to give @j-wags @Yoshanuikabundi an opportunity to chime in if they wish
The only other solution I can think of is making it so that Interchange.positions is so much the sole source of truth that the topology attribute doesn't even store positions. This should be easy with subclassing and would be a significant change outside the scope of this particular use case. A downside is that it basically shifts the burden back onto you; minimizing a molecule and saving it out would be
interchange = sage.create_interchange(...)
interchange.minimize()
new_topology = Topology(interchange.topology)
new_topology.set_positions(interchange.positions)
new_topology.molecule(0).to_file(...)
vs. what I'm assuming you thought would work right now
interchange = sage.create_interchange(...)
interchange.minimize()
interchange.topology.molecule(0).to_file(...)
Virtual sites aren't tracked in Interchange.positions and are stripped out in minimization, so the fact that the toolkit wouldn't be able to store virtual site positions is a non-issue.
This surfaces lots of documentation gaps as well
- What happens to the output of
Interchange.topology.get_positions? (This issue) - What happens when virtual sites are present?
- Are we going to support minimization with other engines, i.e. GROMACS?
- Any other cases of these two topology attributes falling out of sync?
This problem of synchronization between interchange and interchange.topology comes up a lot - positions most commonly, but also partial charges and unit cells. I wonder if making the topology attribute private would be a reasonable thing to do? We could create an Interchange.to_topology()* method that would perform all the synchronization and return a new topology object. This would indicate pretty strongly to the user that modifying the topology won't do anything, but also give users a way to get a topology that is up-to-date with changes made to the interchange while still consolidating all synchronization code in one place that only gets called when it's needed (so it's not the end of the world if it's slow). Internal code could just use Interchange._topology.
I think that basically just encodes our expectations for what users do with the topology attribute into the API and would probably solve all our topology synchronization issues in one swoop. It's also very simple to implement and maintain. It might make operations on the molecules in an Interchange more verbose, but I'm not sure how often people use that.
* I would advocate a method and not a property to signal to the user (A) that modifying this won't modify the interchange and (B) that some work (ie, synchronization) is performed when the function is called - I saw a rule-of-thumb somewhere that properties should be O(1) and I kinda like that.