OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Structure symmetry in OPTIMADE

Open rartino opened this issue 6 years ago • 20 comments

We need to think a bit about how we represent symmetry in relation to structures if we are going to include that. There is indeed good reason to allow queries on symmetry data when available.

If symmetry information is given, it may be best to require giving the list of symmetry operations, which is what strict CIF files do to avoid ambiguity. In addition, spacegroup number, ML, and Hall symbol can also be allowed as optional go give. However, it is probably unwise to allow giving those, without also giving symmetry operations, since that would replicate issues found for less strict CIF files.

The question is how to represent symmetry operations. Some alternatives:

  1. The cif x,y,z format
  2. The cif x,y,z format, but with canonicalized individual symmetry operations (e.g., require translations to be [0-1), etc.
  3. The cif x,y,z format but with the complete set of symmetry operations canonicalized, e.g., by sorting alternative 2.
  4. Some other, more compact, format we come up with.

rartino avatar Jun 14 '18 19:06 rartino

Here is an argument for requiring a fully canonicalized output:

One of the reasons for a non-canonical output was that otherwise one would frequently have to deal with implementations that return incorrect output. However, 'unsorted' symmetry operations is far from the only error an implementation can do when returning symmetry operators. One easy mistake is to return a set of symmetry operators that just cannot be correct together. (Because there is only a finite number of total possibilities). If we require translation coefficients to be [0-1), it is easy to return operations that do not fulfill this, etc. Hence, to avoid incorrect output it would be good if this field can be properly validated by our json schemas to belong exactly to the set of possibilities.

If we require a canonical output (e.g., by a sorted list of operations), then one of us with a framework that uses complete sets of symmetry operations can just generate a list of ALL valid combinations of symmetry operations and we can put that in a schema and say: "The output MUST be exactly one of these alternatives". This is easy to validate. An unsorted list of operations (and possibly with several possibilities for coefficients that mean the same thing) is not easy to validate.

rartino avatar Jun 15 '18 05:06 rartino

can just generate a list of ALL valid combinations of symmetry operations and we can put that in a schema and say: "The output MUST be exactly one of these alternatives".

I suspect this will be impractical since the list of combinations will be huge beyond manageability...

sauliusg avatar Nov 19 '18 15:11 sauliusg

I suspect this will be impractical since the list of combinations will be huge beyond manageability...

Perhaps I'm missing something obvious, but for fully periodic systems, is this not just one entry per hall symbol? I.e., per row in this table: http://cci.lbl.gov/sginfo/hall_symbols.html#Table_6

But, right, that validation only works for fully periodic systems. If there are non-periodic directions, the situation gets a bit more tricky. Does the cif x,y,z format even properly handle point groups in non-periodic systems? How does one represent, e.g., C∞ around an axis?

rartino avatar Nov 19 '18 23:11 rartino

On 20/11/2018 01.44, Rickard Armiento wrote:

I suspect this will be impractical since the list of combinations
will be huge beyond manageability...

Perhaps I'm missing something obvious, but for fully periodic systems, is this not just one entry per hall symbol? I.e., per row in this table: http://cci.lbl.gov/sginfo/hall_symbols.html#Table_6

When you mentioned "generate a list of ALL valid combinations of symmetry operations", this implies to me all permutations of symmetry elements, and all possible shifts. E.g. for the C2 spacegroup not only the:

X,Y,Z -X,Y,-Z 1/2+X,1/2+Y,Z 1/2-X,1/2+Y,-Z

is possible, but also:

X,Y,Z 1/2-X,1/2+Y,-Z -X,Y,-Z 1/2+X,1/2+Y,Z

an also

X,Y,Z X-1/2,Y-1/2,Z -X,Y,-Z -X+1/2,Y+1/2,-Z

and so on. N! and more combinations – too much to list.

Regards, Saulius

sauliusg avatar Nov 20 '18 07:11 sauliusg

But, right, that validation only works for fully periodic systems. If there are non-periodic directions, the situation gets a bit more tricky. Does the cif x,y,z format even properly handle point groups in non-periodic systems? How does one represent, e.g., C∞ around an axis?

CIF has provisions for some aperiodic structures but not for others.

Modulated (incomensurately modulated) structures are handles. Quasicrystals are being worked upon. C∞ around an axis is not handled. One might argue that it is not compatible with discrete atom model either, but the situation arises for disordered groups (e.g. R-CH3 rotationally disordered around the R-C bond). Some aperiodic (or even periodic) models are arguably not crystallographic, e.g. continuum models use different descriptions and are not modelled in in coordinate CIF (but there is a CIF dictionary to describe electron densities).

Thus, I do not think there is a "one size fits all" solution for all possible situations, we need to address what is needed for current models (one of them – discrete atom model); possible leave options for expansion.

sauliusg avatar Nov 20 '18 07:11 sauliusg

When you mentioned "generate a list of ALL valid combinations of symmetry operations", this implies to me all permutations of symmetry elements, and all possible shifts.

Sorry if this was unclear: that part refered to the list of ALL canonical lists of symmetry operations. I.e., in practical terms, a canonical list of symmetry operations puts each individual symmetry operation on a canonical form, and then the list of those symmetry operations is 'sorted' according to some well-defined order. I'm presently of the impression that this list has a manageable length in fully periodic systems.

Validation of a crystal_symmetry_ops (or what we decide to call the property for the canonical list of symmetry operations) is then easy, one uses a jsonschema 'enum' to check its value against the list of ALL valid canonical lists of symmetry operations. To actually be able to validate this field seems as a huge benefit to me, as I've struggled with incomplete/incorrect symmetry information in cif files in the past.

Regarding (partially)-non-periodic structures, if possible, I would really prefer one single scheme for everything that fits the OPTIMaDe structure object; i.e., a collection of atoms; not, e.g., electron densities. If that isn't possible, I guess we'd have to divide into cases as you are describing for CIF.

But, is there really no standardized syntax for just listing symmetry groups in group theory form (e.g., C2v, D3h, C∞, etc.) and the axes/planes they operate on? Because it seems to me that would work universally. And such lists can be validated to some degree.

rartino avatar Nov 20 '18 09:11 rartino

I gather that there are standardization problems with symmetry operator lists (I miss discussion on floating point vs. fraction representation of translation components). However, maybe we could standardize on space group ITC number or Hall symbol and introduce symmetry operators later?

merkys avatar May 30 '22 14:05 merkys

Not sure if this is the right page, but if anyone is looking for a symmetry code, aflow offers a nice suite of symmetry symbols, operations, Wyckoff positions, etc. Can handle inputs in cif, VASP, QE, ABINIT, Elk, FHI-AIMS, and ATAT. Functionality is accessible locally (with the aflow binary installed) or through the web: http://aflow.org/aflow-online/ Full description of capabilities: https://doi.org/10.1107/S2053273318003066

corey$ xzcat POSCAR.relax2.xz | aflow --edata AFLOW VERSION 3.2.11: [aflow.org consortium - 2003-2021] REAL LATTICE Real space lattice: 1.9809e+00 -3.4310e+00 0.0000e+00 1.9809e+00 3.4310e+00 0.0000e+00 0.0000e+00 0.0000e+00 7.0610e+00 Real space a b c alpha beta gamma: 3.961720505 3.961720505 7.060999488 90 90 120 Real space a b c alpha beta gamma: 7.486566198 7.486566198 13.34335424 90 90 120 Bohrs/Degs Real space Volume: 95.98 Real space c/a = 1.782 BRAVAIS LATTICE OF THE CRYSTAL (pgroup_xtal) Real space Bravais Lattice Primitive = HEX Real space Lattice Variation = HEX Real space Lattice System = hexagonal Real space Pearson Symbol = hP4 POINT GROUP CRYSTAL Real space Crystal Family = hexagonal Real space Crystal System = hexagonal Real space Crystal Class = dihexagonal-dipyramidal Real space Point Group (Hermann Mauguin) = 6/mmm Real space Point Group (Schoenflies) = D_6h Real space Point Group Orbifold = *622 Real space Point Group Type = centrosymmetric Real space Point Group Order = 24 Real space Point Group Structure = 2 x dihedral SPACE GROUP OF THE CRYSTAL Space group number = 194 Space group label (Hermann Mauguin) = P6_{3}/mmc Space group label (Hall) = -P 6c 2c Space group label (Schoenflies) = D_{6h}^{4} Laue class = 6/mmm Crystal class = 6/mmm ITC REPRESENTATION OF THE CRYSTAL Setting = 1 Origin = 0.0000e+00 0.0000e+00 0.0000e+00 General Wyckoff position 1 x,y,z 2 -y,x-y,z 3 -x+y,-x,z 4 -x,-y,z+1/2 5 y,-x+y,z+1/2 6 x-y,x,z+1/2 7 y,x,-z 8 x-y,-y,-z 9 -x,-x+y,-z 10 -y,-x,-z+1/2 11 -x+y,y,-z+1/2 12 x,x-y,-z+1/2 13 -x,-y,-z 14 y,-x+y,-z 15 x-y,x,-z 16 x,y,-z+1/2 17 -y,x-y,-z+1/2 18 -x+y,-x,-z+1/2 19 -y,-x,z 20 -x+y,y,z 21 x,x-y,z 22 y,x,z+1/2 23 x-y,-y,z+1/2 24 -x,-x+y,z+1/2 Representative Wyckoff positions 0.00000000000000 0.00000000000000 0.00000000000000 A 2 a -3m. 0.33333333333333 0.66666666666667 0.25000000000000 B 2 c -6m2 WYCCAR ClNa_sv/220 - (220) [AB] (220) ( 1.000000 0.00000000000000 -3.96172050502749 0.00000000000000 3.43095060004752 1.98086025251375 0.00000000000000 0.00000000000000 0.00000000000000 7.06099948832998 2 2 Direct(4) [A2B2] 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.50000000000000 0.33333333333333 0.66666666666667 0.25000000000000 0.66666666666667 0.33333333333333 0.75000000000000 BRAVAIS LATTICE OF THE LATTICE (pgroup) Real space Bravais Lattice Primitive = HEX Real space Lattice Variation = HEX Real space Lattice System = hexagonal SUPERLATTICE (equally decorated) Superlattice lattice: 1.9809e+00 -3.4310e+00 0.0000e+00 1.9809e+00 3.4310e+00 0.0000e+00 0.0000e+00 0.0000e+00 7.0610e+00 Real space a b c alpha beta gamma: 3.961720505 3.961720505 7.060999488 90 90 120 Real space a b c alpha beta gamma: 7.486566198 7.486566198 13.34335424 90 90 120 Bohrs/Degs Real space Volume: 95.98 Real space c/a = 1.782 Real space Bravais Superlattice Primitive = HEX Real space Superlattice Variation = HEX Real space Superlattice System = hexagonal Real space Pearson Symbol Superlattice = hP4 RECIPROCAL LATTICE Reciprocal space lattice: 1.5860e+00 -9.1566e-01 0.0000e+00 1.5860e+00 9.1566e-01 0.0000e+00 0.0000e+00 0.0000e+00 8.8984e-01 Reciprocal space a b c alpha beta gamma: 1.831324912 1.831324912 0.8898436146 90 90 60 Reciprocal space Volume: 2.584 Reciprocal lattice primitive = HEX Reciprocal lattice variation = HEX SPRIM ClNa_sv/220 - (220) [AB] (220) ( [HEX,HEX,hP4] (STD_PRIM doi:10.1016/j.commatsci.2010.05.010) 1.000000 1.98086025251375 -3.43095060004752 0.00000000000000 1.98086025251375 3.43095060004752 0.00000000000000 0.00000000000000 0.00000000000000 7.06099948832998 2 2 Direct(4) [A2B2] 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.50000000000000 0.33333333333333 0.66666666666667 0.25000000000000 0.66666666666667 0.33333333333333 0.75000000000000 SCONV ClNa_sv/220 - (220) [AB] (220) ( [HEX,HEX,hP4] (STD_CONV doi:10.1016/j.commatsci.2010.05.010) 1.000000 1.98086025251375 -3.43095060004752 0.00000000000000 1.98086025251375 3.43095060004752 0.00000000000000 0.00000000000000 0.00000000000000 7.06099948832998 2 2 Direct(4) [A2B2] 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.00000000000000 0.50000000000000 0.33333333333333 0.66666666666667 0.25000000000000 0.66666666666667 0.33333333333333 0.75000000000000

coreyoses avatar May 31 '22 16:05 coreyoses

On the discussion of implementations - we concluded recently in a discussion that what would be useful is a very permissively licensed (e.g. CC0) implementation of a Hall symbol <-> symmetry operator code. Since this is a difficult mapping that will be relevant for a number of API implementations.

rartino avatar May 31 '22 16:05 rartino

On the discussion of implementations - we concluded recently in a discussion that what would be useful is a very permissively licensed (e.g. CC0) implementation of a Hall symbol <-> symmetry operator code. Since this is a difficult mapping that will be relevant for a number of API implementations.

Gemmi library (MPLv2 or LGPLv3) seems to be able to do so.

merkys avatar May 31 '22 16:05 merkys

aflow is GPL, which is compatible with CC0.

coreyoses avatar May 31 '22 16:05 coreyoses

aflow is GPL, which is compatible with CC0.

Sadly, this is not true. GPL is only compatible with GPL.

merkys avatar May 31 '22 16:05 merkys

this is news to me, their FAQs say otherwise.

https://creativecommons.org/faq/#Can_I_apply_a_Creative_Commons_license_to_software.3F

Also, the CC0 Public Domain Dedication is GPL-compatible and acceptable for software. For details, [see the relevant CC0 FAQ entry](https://wiki.creativecommons.org/wiki/CC0_FAQ#May_I_apply_CC0_to_computer_software.3F_If_so.2C_is_there_a_recommended_implementation.3F).

Yes, CC0 is suitable for dedicating your copyright and related rights in computer software to the public domain, to the fullest extent possible under law. [Unlike CC licenses, which should not be used for software](https://wiki.creativecommons.org/wiki/Frequently_Asked_Questions#Can_I_use_a_Creative_Commons_license_for_software.3F), CC0 is compatible with many software licenses, [including the GPL](https://www.gnu.org/licenses/license-list.html#CC0). However, CC0 has not been approved by the [Open Source Initiative](https://opensource.org/) and does not license or otherwise affect any patent rights you may have. You may want to consider using an approved OSI license that does so instead of CC0, such as [GPL 3.0](https://opensource.org/licenses/GPL-3.0) or [Apache 2.0](https://opensource.org/licenses/Apache-2.0).

is this one way?

coreyoses avatar May 31 '22 19:05 coreyoses

This citation says that you can use CC0-licensed code inside GPL-licensed code, not the other way round.

merkys avatar May 31 '22 21:05 merkys

@rartino should this be closed by https://github.com/Materials-Consortia/OPTIMADE/pull/405?

blokhin avatar Jun 07 '22 19:06 blokhin

@blokhin I think this depends on whether space group number and Hall symbol are enough to uniquely define any set of symmetry operators (I tend to think they are not). If not, this issue should remain open until symmetry operator list representation is standardized in OPTIMADE.

merkys avatar Jun 07 '22 19:06 merkys

@blokhin @merkys Indeed - lets keep this open until we've sorted out what to do about representing the full set of symmetry operators. My wish is to find a canonical format that is fairly compact, trivially translatable to a set of symmetry operator matrices, and with the ability to represent any set of symmetry operators, including those relevant for slabs, wires, and molecules. Perhaps also the ability to indicate a tensor transform for the symmetry operation to represent magnetism, etc.

rartino avatar Jun 07 '22 20:06 rartino

I'd stick to the IUCr definition, however probably there is no need to require symops for the standard settings / origins. So if they are omitted, one should assume ITC.

blokhin avatar Jun 07 '22 23:06 blokhin

Not sure if this is the right page, but if anyone is looking for a symmetry code, aflow offers a nice suite of symmetry symbols, operations, Wyckoff positions, etc. Can handle inputs in cif, VASP, QE, ABINIT, Elk, FHI-AIMS, and ATAT. Functionality is accessible locally (with the aflow binary installed) or through the web: http://aflow.org/aflow-online/ Full description of capabilities: https://doi.org/10.1107/S2053273318003066

Thanks for the link, this is a very interesting implementation, I'll have a closer look!

It is definitely very useful as a sample implementation of symmetry handling.

For the OPTIMADE we need, I think, a standardised way to represent symmetry operators, so that any standard-compliant implementation can handle them without conversion.

sauliusg avatar Jun 15 '22 09:06 sauliusg

If we require a canonical output (e.g., by a sorted list of operations), then one of us with a framework that uses complete sets of symmetry operations can just generate a list of ALL valid combinations of symmetry operations and we can put that in a schema and say: "The output MUST be exactly one of these alternatives". This is easy to validate.

IMHO such level of standardisation is unnecessary. Validation can be done easily without this (see below), and managing the list of all alternatives in a standard is an overkill and too much work to do, with virtually no added value.

The symmetry operators are intended to be parsed first of all, and probably few searches will be done on the symop property.

If someone does implement a search on symops, the it is very easy to canonicalise them internally, inside the server (we do so in the COD). The standard (OPTIMADE) just needs to mandate that all symops are faithfully converted to matrices, and all operator sets with the given matrices are found (matched). No need to spell out a list of alternatives for this.

An unsorted list of operations (and possibly with several possibilities for coefficients that mean the same thing) is not easy to validate.

Not true.

Since symops are primarily intended to be parsed, there needs to be a grammar for this. AFAIK, there is no "official" grammar for the symmetry operators (I could not find a definitve reference to the Jones Faithful representation mentioned in Hall's 1981 paper), but there seems to be a community consensus on how these operators look like. Thus, we can easily write a BNF grammar to codify the current practices.

I am prepared to write an EBNF syntax for the symmetry operators. When the grammar is in place, then valiadation of the symops is easy:

  1. check that the syntax is correct (formal);
  2. check that the matrices obtained by parsing the sympos are crystallographic.

The step (2) is probably even unnecessary for protocol validation since it is a semantic check.

sauliusg avatar Jun 16 '22 04:06 sauliusg

While writing EBNF for the symmetry operation syntax, it occurred to me that what we are describing is a regular language, and so can be defined using a regular expression.

I have checked the regexps for 3D symmetry operations, and the whole COD can be checked using the following ones:

/^([-+]?[xyz]([-+][xyz])?([-+]([1-9]\/[0-9]|[0-9](\.[0-9]+)?|\.[0-9]+))?|[-+]?([1-9]\/[0-9]|[0-9](\.[0-9]+)?|\.[0-9]+)([-+][xyz]([-+][xyz])?)?)(,([-+]?[xyz]([-+][xyz])?([-+]([1-9]\/[0-9]|[0-9](\.[0-9]+)?|\.[0-9]+)?)?|[-+]?([1-9]\/[0-9]|[0-9](\.[0-9]+)?|\.[0-9]+)([-+][xyz]([-+][xyz])?)?)){2}$/

Most COD symops match this regexp, and those that do not are clearly defective.

This is of course a hard-to-read RE, but it can be typesetted comfortably by use Perl (PCER) /x option which allows arbitrary white space to be used for formatting the code. Moreover, we could use variables with informative names to define recurring parts of the RE.

The advantages of the RE would be:

  1. simple to write;
  2. nearly universally supported;
  3. practical symmetry operation analysis can be performed by picking them directly from the specification. using braces "()" to capture matching parts; thus no need for a separate parser package;
  4. works in (nearly) any programming language;

Disadvantages

  • hard to read (but can be formatted to increase readability);
  • hard to identify where the syntax error in an non-matching string occurs;
  • unusual for some people?

If we go for defining symops using RE in OPTIMADE, we could use the Jones faithful notation, have mathematically exact definition for it and at the same time have easy means to parse the symops.

My questions:

  • do we want REs to defined symmetry operators in OPTIMADE?
  • do we want, in addition to REs, also an EBNF definition?

sauliusg avatar May 04 '23 11:05 sauliusg

This is of course a hard-to-read RE, but it can be typesetted comfortably by use Perl (PCER) /x option which allows arbitrary white space to be used for formatting the code. Moreover, we could use variables with informative names to define recurring parts of the RE.

This is a space-delimited version of the symop matching regexp, for better readability:

/^
 (
   [-+]? [xyz] ([-+][xyz])? ([-+] ([1-9]\/[0-9] | [0-9](\.[0-9]+)? | \.[0-9]+) )?
   |   # ^-- matches coords ^--- matches either a rational number, such as 1\/2, or a real number, e.g. 2 or 0.5
   [-+]? ([1-9]\/[0-9]|[0-9](\.[0-9]+)?|\.[0-9]+) ([-+] [xyz] ([-+][xyz])? )?
   # NB: COD *does* contain translations with decimal point, e.g. 0.5, they came from the published CIFs.
 )
 (,
   (
     [-+]? [xyz] ([-+][xyz])? ([-+]([1-9]\/[0-9] | [0-9](\.[0-9]+)? | \.[0-9]+)? )?
     |   # ^-- repetitiom of the patterns given above; can (should?) be defined as variables.
     [-+]? ([1-9]\/[0-9] | [0-9](\.[0-9]+)? | \.[0-9]+) ([-+][xyz] ([-+][xyz])? )?
   )
 ){2}
 $
/x

sauliusg avatar May 04 '23 12:05 sauliusg

@sauliusg I strongly support the regex idea.

However, I would also suggest that we limit the numeric part to rational numbers in the form of fractions, i.e. "1/2", but not "0.5" or "2", etc. since proper crystallographic symmetry operations can always be translated to this form. Removing the "real number" part would simplify the regular expression and avoid some ambiguities with the precision (i.e. if we get "0.333" should it be applied as is, or should it be further extrapolated into a more precise number like "0.333333").

Also, the fractional number part should probably be "[1-9]/[1-9]" and not "[1-9]/[0-9]" (no 0 in the denominator).

vaitkus avatar May 05 '23 13:05 vaitkus

@sauliusg I strongly support the regex idea.

However, I would also suggest that we limit the numeric part to rational numbers in the form of fractions, i.e. "1/2", but not "0.5" or "2", etc. since proper crystallographic symmetry operations can always be translated to this form. Removing the "real number" part would simplify the regular expression and avoid some ambiguities with the precision (i.e. if we get "0.333" should it be applied as is, or should it be further extrapolated into a more precise number like "0.333333").

Also, the fractional number part should probably be "[1-9]/[1-9]" and not "[1-9]/[0-9]" (no 0 in the denominator).

That's true. The provided REs were the ones that a) can capture the current interpretable symmetry operators in the COD and b) demonstrate how the symop REs would look like. The OPTIMADE specification should of course mandate strict(er) representation of symops. Not only we do not want 0 in the denominator; we can also expliciteky list '[2346]' as permitted denominators, since only such axes are allowed in 3D space groups. Or we could list all permissible shifts explicitly ('(1/2|[12]/3|[1-3]/4|[1-5]/6)'). This still allows to specify "incorrect" (i.e. non-crystallographic) symmetry operators, but limits syntax to some reasonable subset.

We (I?) need to look at the permissible axes in 4D crystallographic symmetry operations, this would limit symops for modulated crystals and some quazicrystals.

sauliusg avatar Jun 05 '23 08:06 sauliusg

PR #464 has been opened to add the REs for crystallographic 3D symmetry operators.

merkys avatar Jun 08 '23 09:06 merkys

In view of #464 please let‘s define explicitly once again the MUST-SHOULD-MAY-etc. support level for symmetry ops.

I tend to the MAY or even weaker, because we as the crystallographers are already supposed to have our crystalline structures standardized according to the IUCr / tables. Note, that providing symmetry ops for a standardized structure is redundant, and the majority (all?) of the modern frameworks imply your structure is standard by definition, i.e. the providing the symmetry ops is optional.

For instance, consider a space group number 62, should that mean Pnma or Pbnm? likewise for the settings / origins. The standard gives a clear answer for that (namely, Pnma), and we in Pauling File / MPDS are always supplying our structures ONLY according to the standard as implemented in STRUCTURE TIDY, even if originally they were reported differently.

blokhin avatar Jun 09 '23 11:06 blokhin

Closed by #405 & #464.

We can continue the discussion of awkward cases in the relevant issues (or open new ones):

  • #477

ml-evs avatar Apr 04 '24 14:04 ml-evs