OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Should atomic properties be REQUIRED?

Open giovannipizzi opened this issue 4 years ago • 6 comments

See e.g. this comment and the following.

In particular: dimension_types is optional, but they are important. E.g., a Si_2 molecule and a Si crystal might only differ on the dimension_types, and they are very different things.

Or cartesian_site_positions, species_at_sites, species.

Or at least, should we say that if one is present, all MUST be present?

giovannipizzi avatar Jun 12 '20 20:06 giovannipizzi

dimension_types is optional, but they are important. E.g., a Si_2 molecule and a Si crystal might only differ on the dimension_types, and they are very different things.

Or cartesian_site_positions, species_at_sites, species.

Or at least, should we say that if one is present, all MUST be present?

I remember @merkys (pinging him on this) argumented that making cartesian_site_positions required may put undue stress on servers that compute them on-the-fly (our's does) and will delay response and waste bandwidth for clients that do not need this information (e.g. you are only interested in cell volumes and crystal densities). I would appreciate if Andrius comments more on this point.

sauliusg avatar Jun 13 '20 10:06 sauliusg

I see. And now I remember the use case of not giving atomic positions.

However, if atomic positions are given, I think we want to give also the rest of the arrays, right?

giovannipizzi avatar Jun 13 '20 21:06 giovannipizzi

I'm a bit reluctant to phrase these requirements in the specification, i.e., if A is not null then B must also not be null. In a previous round we really tried to simplify the requirements on properties in terms of their querability and null-ness so the options fall into a few categories, rather than having to be hand-coded for each individual property.

rartino avatar Jun 13 '20 23:06 rartino

I remember @merkys (pinging him on this) argumented that making cartesian_site_positions required may put undue stress on servers that compute them on-the-fly (our's does) and will delay response and waste bandwidth for clients that do not need this information (e.g. you are only interested in cell volumes and crystal densities). I would appreciate if Andrius comments more on this point.

Current COD supports coordinate-based properties by request, thus it adheres to MUST requirement already despite the computations it takes. So I am for making the requirement stricter.

merkys avatar Jun 15 '20 05:06 merkys

We've just run into this again over at https://github.com/Materials-Consortia/optimade-python-tools/pull/560. The question is whether the specification requires values of SHOULD fields depending on the values of other SHOULD fields, e.g. if dimension_types = [1, 1, 0], is nperiodic_dimensions still allowed to be null? From my current reading of the specification, I would lean towards yes, as any ambiguity should be resolved in the least strict way (at least for validation).

Currently the validator does not allow this, as the check for consistency between correlated fields like these also includes a check for nullity. The PR linked above relaxes this constraint and allows all SHOULD fields to be null, no matter the values of other fields.

What is unclear to @CasperWA and myself is whether this is an open question on the interpretation of v1.0 of the specification, or whether the above discussion is a proposal for modifying the specification in, say, v1.1.

In terms of our implementation, we are considering a process whereby a warning is emitted if one of two correlated fields is not provided, with an error only being raised if the values are inconsistent. This is already the case for the "SHOULD" field assemblies being present if the "MUST" field structure_features contains the value 'assemblies'.

ml-evs avatar Nov 04 '20 13:11 ml-evs

We've just run into this again over at Materials-Consortia/optimade-python-tools#560. The question is whether the specification requires values of SHOULD fields depending on the values of other SHOULD fields, e.g. if dimension_types = [1, 1, 0], is nperiodic_dimensions still allowed to be null? From my current reading of the specification, I would lean towards yes, as any ambiguity should be resolved in the least strict way (at least for validation).

I would think the same. If the correlation of fields is not explicitly established in the specification, then I guess they should be treated as uncorrelated.

What is unclear to @CasperWA and myself is whether this is an open question on the interpretation of v1.0 of the specification, or whether the above discussion is a proposal for modifying the specification in, say, v1.1.

Good question. It would be interesting to look into how other specification projects deal with reducing ambiguity :)

merkys avatar Nov 04 '20 16:11 merkys