OPTIMADE
OPTIMADE copied to clipboard
Distinguish experimental structures from theoretical
As suggested by @BobHanson, there should be standard means to distinguish between experimental and theoretical structures. This could be a property with boolean/enum values. I would suggest "MUST" level of support (maybe even for queries), as I believe this bit of information should always be available.
Could we get away with defining a new enum value in structure_features
for this?
Could we get away with defining a new enum value in
structure_features
for this?
Sounds good to me. Should it be theoretical
for theoretical structures? Or rather experimental
for experimental ones? Which is more natural? For me it is theoretical
, but I come from experimental background, hence my bias :smile:
I think we will need to have both theoretical
and experimental
. That way we can implicitly also have the case where it is undefined, which may be useful for old entries for which it was not recorded whether it is a theoretical or experimental structure, or simply for databases that have not been updated.
What you are proposing does stretch the meaning of the structure_features
field, which is defined as: A list of strings that flag which special features are used by the structure.
Perhaps this is also a good moment to think about how we want to include more detailed information about how the structure was generated. Especially information that would be interesting for ~~generating~~ querying structures. X-ray scattering or neutron scattering, ab initio calculations. The software package that was used. etc.
It is definitely a property of a data element (one element of the array, as opposed to the overall set of records). I agree that it is not something to add on to some existing "structure features" string. It's more important than that. How about a new key called "nature" within data:
data[i].nature: {"experimental"|"theoretical"}
Reading @JPBergsma and @BobHanson responses I am now leaning towards separate property. It could actually provide more information about the origin of a structure. In the COD, we have a CIF data item _cod_struct_determination_method
with the following possible values: single crystal
, powder diffraction
and theoretical
. Maybe something similar could be introduced into OPTIMADE.
This sounds great to me. But can you have theoretical PD? Re there two concepts here?
data[i].nature: {"experimental"|"theoretical"} data[i].method: {"single crystal diffraction"|"powder diffraction"}
This sounds great to me. But can you have theoretical PD? Re there two concepts here?
data[i].nature: {"experimental"|"theoretical"} data[i].method: {"single crystal diffraction"|"powder diffraction"}
Right. Then these should be separate properties.
How should we name such a property? Some suggestions:
-
nature
-
origin
-
determination_method
.
Personally, nature
does not sound immediately clear to me, origin
might also be quite ambiguous.
Personally, nature does not sound immediately clear to me, origin might also be quite ambiguous.
Yes, I also would not know a good name for this distinction. From the suggestions above I found determination_method
the clearest. But perhaps we can also name it simply experimental_or_theoretical
.
experimental_method?
On Wed, Jun 1, 2022 at 6:14 PM Johan Bergsma @.***> wrote:
Personally, nature does not sound immediately clear to me, origin might also be quite ambiguous.
Yes, I also would not know a good name for this distinction. From the suggestions above I found determination_method the clearest. But perhaps we can also name it simply experimental_or_theoretical .
— Reply to this email directly, view it on GitHub https://github.com/Materials-Consortia/OPTIMADE/issues/406#issuecomment-1143812327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHNCW5BWGG7JETSFPHBJBDVM6D7LANCNFSM5XMWYTBA . You are receiving this because you were mentioned.Message ID: @.***>
-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want, it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
We stand on the homelands of the Wahpekute Band of the Dakota Nation. We honor with gratitude the people who have stewarded the land throughout the generations and their ongoing contributions to this region. We acknowledge the ongoing injustices that we have committed against the Dakota Nation, and we wish to interrupt this legacy, beginning with acts of healing and honest storytelling about this place.
@JPBergsma: experimental_or_theoretical
will need renaming should more enumerator values be introduced (i.e., mixed
or something else).
@BobHanson: "experimental_method": "experimental"
sounds slightly wonky to me.
Ah, right. This was in reference to
experimental_method: {single crystal diffraction | powder diffraction|...} investigation_type: {experimental | theoretical}
brainstorming...
Following the discussion with @sauliusg I can also point out many edge cases where the experimental
or theoretical
nature is not immediately clear. An example is de-novo crystal structure refinement, see [1], [2], and many more.
cf. computational experiments vs. experimental modeling
not voting for "computational experiment". I understand the desire to consider computational approaches "experiments" but I think this is not well understood.
On Fri, Jun 3, 2022 at 2:54 PM Evgeny Blokhin @.***> wrote:
cf. computational experiments vs. experimental modeling
— Reply to this email directly, view it on GitHub https://github.com/Materials-Consortia/OPTIMADE/issues/406#issuecomment-1145934888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHNCW4SIE52SO7NPZNI6XLVNH6BXANCNFSM5XMWYTBA . You are receiving this because you were mentioned.Message ID: @.***>
-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr
If nature does not answer first what we want, it is better to take what answer we get.
-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
We stand on the homelands of the Wahpekute Band of the Dakota Nation. We honor with gratitude the people who have stewarded the land throughout the generations and their ongoing contributions to this region. We acknowledge the ongoing injustices that we have committed against the Dakota Nation, and we wish to interrupt this legacy, beginning with acts of healing and honest storytelling about this place.
experimental_or_theoretical will need renaming should more enumerator values be introduced (i.e., mixed or something else).
Yes, you are right, that is not convenient.
How about method_class
or method_category
? That allows another field named method
that holds a more specific term for the procedure used to generate the data.
Indeed, there is a whole spectrum of methods ranging from purely experimental (can we actually get coordinates without any theoretical assumptions?) to purely theoretical. We probably would need a separate ontology just to identify where a structure sits in that spectrum.
But for our purposes suggest not reinventing the wheel or overcomplicating. Go with the ICSD conception here. Keep it simple. Maybe allow for some ambiguous third category but don't insist that every conceivable possibly is covered.
On Mon, Jun 6, 2022, 2:03 PM Andrius Merkys @.***> wrote:
Indeed, there is a whole spectrum of methods ranging from purely experimental (can we actually get coordinates without any theoretical assumptions?) to purely theoretical. We probably would need a separate ontology just to identify where a structure sits in that spectrum.
— Reply to this email directly, view it on GitHub https://github.com/Materials-Consortia/OPTIMADE/issues/406#issuecomment-1147373570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHNCW4I4HLX22N2WPIEFZTVNXSKJANCNFSM5XMWYTBA . You are receiving this because you were mentioned.Message ID: @.***>
@BobHanson Is there a link for the said ICSD conception?
I understand the desire to add something ASAP to help distinguish experimental and theoretical structural data. However, I'd suggest to be careful to not over-design this interface, since it is debatable if this info even belongs in this endpoint. Going forward, we won't be able to stuff all possible experimental and theoretical details related to a structure into the structure
endpoint and, at least for theoretical structures, I believe our consensus is that things like "which method", etc., belongs in the calculations
endpoint with relationships to "input" and "output" structures.
Hence, I suggest this to just be a simple boolean field: experimental
that is defined to be True if, and only if, the structural data, including the atomic coordinates, represented by the structure have been obtained more or less directly out of an experiment and thus the crystal structure reasonably can be understood to have been observed in nature.
The alternative, False, just means that the structure has been obtained some other way. E.g., hypothetical structures through substitutions (perhaps including DFT relaxations, etc., but not necessarily), structure prediction algorithms, just random initialization, etc. No guarantees that these structures "make sense".
(Or is there a very strong desire to also distinguish theoretical structures that the database strongly believes are at, or very close, to the convex hull of stability? This, I believe, is the ICSD criterion for inclusion.)
Unfortunately it starts to be complicated here. Imagine we took an experimental structure and relax it fully with the DFT, ending up with the different cell, symmetry, atomic positions, etc. Is the structure still experimental
?
@blokhin
Unfortunately it starts to be complicated here. Imagine we took an experimental structure and relax it fully with the DFT, ending up with the different cell, symmetry, atomic positions, etc. Is the structure still experimental?
It was my intent to mostly avoid this complexity by a single stringent definition separating everything into "directly from experiment" vs. other. My definition above was meant to say that your example is not an experimental structure.
Following the discussion with @sauliusg I can also point out many edge cases where the
experimental
ortheoretical
nature is not immediately clear. An example is de-novo crystal structure refinement, see [1], [2], and many more.
I think my suggestion was for actual vs hypothetical which maybe makes this slightly clearer (though shifts the vagueness elsewhere, e.g. whether a DFT database that simply took experimental structures and calculated band gaps without relaxing should report itself as hypothetical or actual).
The two relevant axes for filtering seem to me to be whether something has actually been made, and whether the structure is simply the result of minimising or sampling of a Hamiltonian
Sorry -- that ICSD paper reference: https://journals.iucr.org/j/issues/2019/05/00/in5024/index.html and supporting information
Noting that there is a discussion of this in matsci.org https://matsci.org/t/how-is-the-theoretical-tag-determined/3527
So perhaps the boolean "theoretical" is appropriate (matching ICSD). But this post does point out the same issue -- that it is not always possible to distinguish. I think one would just have to trust repositories to do their best job here. AFLOW could distinguish (perhaps?) between their ICSD entries (which are presumably NOT theoretical) from their calculations. @ @.*** (Cormac)
I do feel strongly that there MUST be some sort of flag regarding this. Serving up purely calculated structures is not the same as delivering x-ray crystallographic results. This is a widespread, growing issue throughout the data world. My recommendation: keep it simple.
Bob
Having read the discussion, I tend to agree with those of you favoring single boolean flag. The question now is where to draw the line. However, neither ICSD paper nor related discussion on matsci.org does provide clear criteria (thanks @BobHanson for links, though). @vaitkus, maybe IUCr has put up any criteria?
I am a bit skeptical regarding the structures
relationships with calculations
though. In TCOD we have theoretically calculated structures from journal publications, but usually machine-readable metadata related to actual calculations is scarce (but reported in human-readable publications). Thus if calculations
entries become mandatory for theoretical structures, we would not be able to return much meaningful data in them.
@merkys, as far as I know, the IUCr does not have any such criteria.
However, the ICSD paper lists three types of subclasses of theoretical structures:
- Predicted (non-existing) crystal structure.
- Optimized (existing) crystal structure.
- Combination of theoretical and experimental structure.
Based on this, I would say that according to them anything that is not purely experimental is classified as theoretical.
Based on this, I would say that according to them anything that is not purely experimental is classified as theoretical.
I think there might be difficulties in drawing the line between refinement with statistical potentials, forcefields and DFT.
@merkys
I am a bit skeptical regarding the structures relationships with calculations though. In TCOD we have theoretically calculated structures from journal publications, but usually machine-readable metadata related to actual calculations is scarce (but reported in human-readable publications). Thus if calculations entries become mandatory for theoretical structures, we would not be able to return much meaningful data in them.
I don't think anyone proposed to make them mandatory for theoretical entries? Just that if you have data or metadata related to the calculation itself for, say, a calculation that started from one structure, and resulted into a couple of output structures, that data would better belong under the calculations
endpoint than being stored alongside the structures. For example, if you want to provide details on cutoffs, k-point grids, DFT functionals, etc.
I don't think anyone proposed to make them mandatory for theoretical entries?
No. I mistakenly assumed this was the suggested solution for telling experimental structures from theoretical.
Just that if you have data or metadata related to the calculation itself for, say, a calculation that started from one structure, and resulted into a couple of output structures, that data would better belong under the
calculations
endpoint than being stored alongside the structures. For example, if you want to provide details on cutoffs, k-point grids, DFT functionals, etc.
Agree.
Q1: Are theoretical and experimental the correct two options?
I suggest yes:
There is a paper from ICSD: Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features http://scripts.iucr.org/cgi-bin/paper?in5024, where, for example, we see:
In order to be included in the ICSD, a theoretical structure has to be fully characterized, the atomic coordinates determined and the composition fully specified, similarly to* experimental structures*.
Table 1 Comparison of databases containing experimental and/or theoretical crystal structures (14 uses of "experimental structure") (26 uses of "theoretical structure")
So, I argue, these are the terms to use.
As for
xxx_yyy = { experimental | theoretical }
I suggest NOT using "structure_type" as that actually means something different.
Maybe "determination_type"
"experimentally determined structure" Google 20,000 hits.
admittedly,
"theoretically determined crystal structure" has only 3 hits. So many that is a bit of a problem.
Next idea?