OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

No way to use IS KNOWN/IS UNKNOWN in list comparisons

Open merkys opened this issue 3 years ago • 2 comments

While thinking about boolean representation in queries in #345, I noticed the filter language does not have provisions to use IS KNOWN/IS UNKNOWN in list comparisons. I think filter grammar could be extended to support HAS IS KNOWN type queries.

merkys avatar Dec 17 '20 12:12 merkys

The intimidate thought is to introduce a NULL constant token alongside the TRUE and FALSE proposed in #345.

However, since we dropped null in coordinates to represent unknown coordinates, as far as I know, the only standardized use of null in lists is in lattice_vectors. It may be a bit late to do this now, but we could consider dropping null inside lists as a concept in OPTIMADE, and only allow fields in their entirety to be either KNOWN or UNKNOWN. That would further simplify our data model, and explain why we wouldn't need a NULL constant. But I say that as someone who generally dislike the idea of embedding nulls in data of other types.

rartino avatar Dec 17 '20 13:12 rartino

Introduction of NULL constant seems quite natural. Many programming languages and SQL have it, so maybe we could complement IS KNOWN/IS UNKNOWN with syntactically simpler and more powerful NULL constant? Surely we have to define what < and > operators mean for it, just like with boolean constants in #345.

However, since we dropped null in coordinates to represent unknown coordinates, as far as I know, the only standardized use of null in lists is in lattice_vectors. It may be a bit late to do this now, but we could consider dropping null inside lists as a concept in OPTIMADE, and only allow fields in their entirety to be either KNOWN or UNKNOWN. That would further simplify our data model, and explain why we wouldn't need a NULL constant. But I say that as someone who generally dislike the idea of embedding nulls in data of other types.

I do not like the idea of such restriction of the data model, as I do not think all uses of NULL values in lists could be avoided. For instance, we are discussing about including CIF data in OPTIMADE, and CIF standard has two NULL-like concepts: unknown value and inapplicable value. I do not think we can avoid any of these concepts, and certainly not both.

merkys avatar Dec 22 '20 06:12 merkys