OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Extended filtering on relationships

Open ml-evs opened this issue 2 years ago • 4 comments

Currently we expected relationships filtering to be a two-step process:

Note: formulating queries on relationships with entries that have specific property values is a multi-step process. For example, to find all structures with bibliographic references where one of the authors has the last name "Schmit" is performed by the following two steps:

  • Query the references endpoint with a filter authors.lastname HAS "Schmit" and store the id values of the returned entries.
  • Query the structures endpoint with a filter references.id HAS ANY <list-of-IDs>, where <list-of-IDs> are the IDs retrieved from the first query separated by commas.

In my implementation, I would like to support doing this in one-step, via e.g., /structures?filter=references.doi = "10.1234/12345".

This seems like a trivial extension to the specification (implementations MAY support relationship filtering via...). The only conflict would be the special description field we added for relationship filtering. I have not seen anyone using this, but we can maintain compatibility by reserving it as a keyword and never using description as an attribute name for any entry type (so that references.description is always unambiguously referring to the relationship description).

Am I missing some technical reason that we can't allow this syntax as optional, or was the issue that to be able to handle this robustly across different implementations we have to use the two-step process?

ml-evs avatar Dec 16 '22 16:12 ml-evs

I think the idea was to limit querying to the information which is normally returned within the response. But I do not see why optional support past that could not be allowed. The exceptional handling of description might be problematic, this is too good a property name to forbid :smile:

merkys avatar Jun 08 '23 15:06 merkys

This is made more relevant by the potential future use cases for /calculations, with our current approach it would be impossible to find relationships with calculations that specifically calculate some property, whereas with this we could do e.g. /structures?filter=calculations._my_scan_band_gap<0.5

ml-evs avatar Jun 12 '23 18:06 ml-evs

This issue has resurfaced today in the workshop in the same context of filtering structures on the results of their calculations. Thus I think it deserves to have its severity bumped.

With the advent of /files endpoint, description is now a property in files entry type. Thus we cannot forbid description as a property name anymore. I guess for OPTIMADE v1.x we can retain this special handling of meta.description of relationships, with minimal loss. In any case files.description is supposed to be a human-readable string, most likely not that useful in queries.

merkys avatar Jun 11 '24 21:06 merkys

I guess for OPTIMADE v1.x we can retain this special handling of meta.description of relationships, with minimal loss. In any case files.description is supposed to be a human-readable string, most likely not that useful in queries.

... or we can attempt dropping the special provision for meta.description as something that was rarely used. I can draft a PR for the extended filtering. Let us collect the opinions about meta.description issue there.

merkys avatar Jun 12 '24 05:06 merkys