librascal icon indicating copy to clipboard operation
librascal copied to clipboard

structure sanitation

Open felixmusil opened this issue 3 years ago • 10 comments

Improve the handling of input structures before passing them to the neighbor list. The neighbor list requires atoms to be in the unit cell so the atoms of periodic structures are now wrapped and atoms of gas-phase structure are moved into a unit cell that bounds minimally the structure.

felixmusil avatar Mar 17 '21 23:03 felixmusil

From #307

This is the automatic workaround. The problem is that ASE sets the cell to be non-periodic if you read a file with no (or unreadable) cell information, even if you explicitly set a cell later, so you have to remember to set PBC explicitly as well. I've lost weeks of work because of this misunderstanding.

Originally posted by @max-veit in https://github.com/cosmo-epfl/librascal/pull/307#discussion_r602399740

Could we also check/set pbc flags if the structure cell is defined to be non zero?

Luthaf avatar Mar 26 '21 17:03 Luthaf

I would prefer to avoid changing the fundamental of the structure that is provided. We do it for isolated structures because there it's strictly equivalent but if the user has a cell with no pbc then it should not be changed. What @max-veit is referring to is related to ase and has nothing to do with rascal.

felixmusil avatar Mar 27 '21 15:03 felixmusil

Sice our structure reading is very coupled with ASE, I think this is very much related to librascal. We don't have to change the structure provided to improve the situation though, emitting a warning if the cell is non zero but PBC is false would already help a lot!

Luthaf avatar Mar 29 '21 07:03 Luthaf

emitting a warning if the cell is non zero but PBC is false

There is no reason to emit a warning in this case since defining a unit cell and have no PBC are unrelated in my opinion. Why should we add this convention?

felixmusil avatar Mar 29 '21 08:03 felixmusil

They are very much related in my opinion -- in principle, there's no point in defining a cell if your system is non-periodic. The only reason we need a cell is an artifact of how we implement the neighbour list. If your system is actually non-periodic, then we can safely guess a unit cell without affecting any computed properties.

I've solved the issue in #307 by explicitly requesting a periodic/non-periodic flag and making sure it's set the way the user expects, which I think is sensible for an MD driver. But aside from that, there is still an ambiguity if (1) no cell is provided (is the system non-periodic, or was a cell provided in an unreadable format?), (2) a cell is provided, but pbc is set to False (is this a mistake, e.g. setting a cell explicitly but forgetting to set pbc too? Or is it actually non-periodic?). I think we should warn the user in both cases.

max-veit avatar Mar 29 '21 16:03 max-veit

in principle, there's no point in defining a cell if your system is non-periodic. The only reason we need a cell is an artifact of how we implement the neighbor list.

I agree with you, but my point is a bit different. The periodicity is not meant to be provided through the unit cell since it is given by the PBC flags. The two pieces of information are not redundant and should be taken seriously, i.e. not assume that the user might have meant something else. So I don't think it is a good idea to infer periodicity from the unit cell.

If your system is actually non-periodic, then we can safely guess a unit cell without affecting any computed properties.

This is the behavior introduced in this PR.

(1) no cell is provided (is the system non-periodic, or was a cell provided in an unreadable format?)

To me this is for the user to work it out since it is his input whether it is read using ASE or something else.

(2) a cell is provided, but pbc is set to False

This is not a mistake in my opinion. In this PR the cell will be overwritten to fit the atoms closely because the pbc arguments are taken seriously.

felixmusil avatar Mar 30 '21 11:03 felixmusil

Should you have a look at this small PR ? Or should it be deleted ? @ceriottm @max-veit

felixmusil avatar Oct 04 '21 17:10 felixmusil

Don't have time right now but I wouldn't want this to go away - it's important and would save a lot of fiddling with non-periodic structures.

Personally I would implement the "sanitation" of non-periodic structures that don't have a cell defined in a different way - I would compute and set the "fictitious cell" non destructively, whenever the neighbor list is called on a cell with no PBC set.

It's bit wasteful, but it's better than changing in-place the structure in a way that might have unintended consequences outside the librascal call. Thoughts?

ceriottm avatar Oct 05 '21 23:10 ceriottm

I would compute and set the "fictitious cell" non destructively

Would a preliminary copy of the input object be sufficient to address your comment ?

felixmusil avatar Oct 06 '21 08:10 felixmusil

I would compute and set the "fictitious cell" non destructively

Would a preliminary copy of the input object be sufficient to address your comment ?

yes. although it'd be good to do it structure-by-structure rather than as a whole, in cases where memory is a problem

ceriottm avatar Oct 06 '21 10:10 ceriottm