librascal
librascal copied to clipboard
structure sanitation
Improve the handling of input structures before passing them to the neighbor list. The neighbor list requires atoms to be in the unit cell so the atoms of periodic structures are now wrapped and atoms of gas-phase structure are moved into a unit cell that bounds minimally the structure.
From #307
This is the automatic workaround. The problem is that ASE sets the cell to be non-periodic if you read a file with no (or unreadable) cell information, even if you explicitly set a cell later, so you have to remember to set PBC explicitly as well. I've lost weeks of work because of this misunderstanding.
Originally posted by @max-veit in https://github.com/cosmo-epfl/librascal/pull/307#discussion_r602399740
Could we also check/set pbc flags if the structure cell is defined to be non zero?
I would prefer to avoid changing the fundamental of the structure that is provided. We do it for isolated structures because there it's strictly equivalent but if the user has a cell with no pbc then it should not be changed. What @max-veit is referring to is related to ase and has nothing to do with rascal.
Sice our structure reading is very coupled with ASE, I think this is very much related to librascal. We don't have to change the structure provided to improve the situation though, emitting a warning if the cell is non zero but PBC is false would already help a lot!
emitting a warning if the cell is non zero but PBC is false
There is no reason to emit a warning in this case since defining a unit cell and have no PBC are unrelated in my opinion. Why should we add this convention?
They are very much related in my opinion -- in principle, there's no point in defining a cell if your system is non-periodic. The only reason we need a cell is an artifact of how we implement the neighbour list. If your system is actually non-periodic, then we can safely guess a unit cell without affecting any computed properties.
I've solved the issue in #307 by explicitly requesting a periodic/non-periodic flag and making sure it's set the way the user expects, which I think is sensible for an MD driver. But aside from that, there is still an ambiguity if (1) no cell is provided (is the system non-periodic, or was a cell provided in an unreadable format?), (2) a cell is provided, but pbc is set to False (is this a mistake, e.g. setting a cell explicitly but forgetting to set pbc too? Or is it actually non-periodic?). I think we should warn the user in both cases.
in principle, there's no point in defining a cell if your system is non-periodic. The only reason we need a cell is an artifact of how we implement the neighbor list.
I agree with you, but my point is a bit different. The periodicity is not meant to be provided through the unit cell since it is given by the PBC flags. The two pieces of information are not redundant and should be taken seriously, i.e. not assume that the user might have meant something else. So I don't think it is a good idea to infer periodicity from the unit cell.
If your system is actually non-periodic, then we can safely guess a unit cell without affecting any computed properties.
This is the behavior introduced in this PR.
(1) no cell is provided (is the system non-periodic, or was a cell provided in an unreadable format?)
To me this is for the user to work it out since it is his input whether it is read using ASE or something else.
(2) a cell is provided, but pbc is set to False
This is not a mistake in my opinion. In this PR the cell will be overwritten to fit the atoms closely because the pbc arguments are taken seriously.
Should you have a look at this small PR ? Or should it be deleted ? @ceriottm @max-veit
Don't have time right now but I wouldn't want this to go away - it's important and would save a lot of fiddling with non-periodic structures.
Personally I would implement the "sanitation" of non-periodic structures that don't have a cell defined in a different way - I would compute and set the "fictitious cell" non destructively, whenever the neighbor list is called on a cell with no PBC set.
It's bit wasteful, but it's better than changing in-place the structure in a way that might have unintended consequences outside the librascal call. Thoughts?
I would compute and set the "fictitious cell" non destructively
Would a preliminary copy of the input object be sufficient to address your comment ?
I would compute and set the "fictitious cell" non destructively
Would a preliminary copy of the input object be sufficient to address your comment ?
yes. although it'd be good to do it structure-by-structure rather than as a whole, in cases where memory is a problem