OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Clarify required fields in a response

Open CasperWA opened this issue 5 years ago • 9 comments

Discussing the implementation of #210 in optimade-python-tools (PR Materials-Consortia/optimade-python-tools#153) with @ltalirz (and @ml-evs) we have found that it is quite unclear what is meant by SHOULD in the spec at the moment. Specifically these lines are fuzzy.

We understand the need for certain fields to be "on-request", i.e., only given if explicitly asked for via response_fields. As well as having other fields that are REQUIRED in the response at all times, like id.

We suggest to combine the **Support** and **Response** part of all fields under Entry List and rewrite the section linked above to be more clear.

Tagging @merkys and @rartino here, since they were the main contributors to #210.

CasperWA avatar Feb 04 '20 18:02 CasperWA

we have found that it is quite unclear what is meant by SHOULD in the spec at the moment. Specifically these lines are fuzzy.

Is it possible to elaborate a bit on what is not clear? Your link goes to a single line that does not contain the word SHOULD. Are you referring to "SHOULD NOT" in point 3 of the section 'Entry Listing Endpoints'?

That SHOULD NOT means that an API implementation should avoid including properties in the response that are neither marked as REQUIRED nor are included in a supplied response_fields query parameter. However, it is still standards-conformant if it does.

We suggest to combine the **Support** and **Response** part of all fields under Entry List

Support tells you when a property is allowed to behave as null or not in queries and as return values.

Response tells how the property interacts with the response_fields query parameter.

Can you give examples of how you would prefer these to be written into one "thing"?

rartino avatar Feb 04 '20 19:02 rartino

I'll try my best to explain, but I'm sure it's going to get convoluted.

First, concerning the three categories for the Entry list specific properties:

  • Properties marked as REQUIRED in the response. These properties MUST always be present for all entries in the response.

This seems clear to me: It means a property MUST be present for a resource object in the response, no matter the values given to response_fields.

  • Properties marked as REQUIRED only if the query parameter response_fields is not part of the request, or if they are explicitly requested in response_fields. Otherwise they MUST NOT be included. One can think of these properties as consituting a default value for response_fields when that parameter is omitted.

The first sentence here is difficult for me to understand - mostly because I cannot seem to properly parse the first part. What I understand from the sentence (with a revision for my parsing abilites) is that this is a category of properties that do not necessarily have to be present in the response by default, but the server MUST implement them and return them if they are explicitly asked for via response_fields. Also, there is a minor error in the missing "t" in constituting ... Concerning the second sentence, it makes sense, but not in connection with the first one. At least not with my understanding of it. The second sentence refers to "a defaullt value for response_fields", which is what I would call the properties that are REQUIRED to be implemented by a server and passed by default to the response. Hmm, reading the sentences over and over, I feel like I am starting to understand that maybe the first one can be read that way - but it is by no means clear. Maybe it could be rewritten as such:

  • Properties that MUST be supported by the implementation and are considered default fields in the response for the given resource object(s), i.e., if the response_fields query parameter is not supplied in the request, these properties are REQUIRED in the response. These properties MUST be left out if not included explicitly in the value list for response_fields.
  • Properties not marked as REQUIRED in any case, MUST be included only if explicitly requested in the query parameter response_fields. Otherwise they SHOULD NOT be included.

This seems clear to me: It means a property that an implementation MUST support, but should be left out of the default response and only included if explicitly requested via response_fields.

This also leaves no room for OPTIONAL properties it seems, since there is no category specified for these.

It may be helpful to make a table for these properties, something like this:

Category REQUIRED in implementation OPTIONAL in implementation REQUIRED always in response default in response on-request in response
1 X - X X X
2 X - - X X
3 X - - - X

So that's part one ... Now it comes to the Entry list of properties. Keeping the above in mind, since it acts as the definition of the different kinds of properties there can be, let's look at the universal entry property type:

  • Requirements/Conventions:
  • Support: MUST be supported by all implementations, MUST NOT be null.
  • Query: MUST be a queryable property with support for all mandatory filter features.
  • Response: REQUIRED in the response.

From the Support requirement I understand that it MUST be in the implementation and cannot be null, i.e., have an unknown value. From the Response requirement I understand it falls into category 1 from above.

From the table above, the first part of the Support requirement is clear (since there are no OPTIONAL properties allowed) and then it can't be null. Sure, I mean, I don't think that has to do with whether it's supported by the implementation or not, but with its value.

Let's now look at the universal entry property last_modified:

  • Requirements/Conventions:
  • Support: SHOULD be supported by all implementations, i.e., SHOULD NOT be null.
  • Query: MUST be a queryable property with support for all mandatory filter features.
  • Response: REQUIRED in the response unless the query parameter response_fields is present and does not include this property.

From the Support requirement I can suddenly understand that it SHOULD be supported by the implementation, i.e., it's essentially OPTIONAL, which is not an allowed category (according to the table and definitions above). Furthermore, it SHOULD NOT be null, which means it can. Why not instead then write:

  • Support: OPTIONAL support, i.e., MAY be null.

From the Response requirement I understand that it falls into category 2. But category 2 states that it MUST be returned if requested for via response_fields. This means it MUST be supported by the implementation, but it MAY be null.


Now to a more wholesome part.

I think I'm grasping what the intent is with everything, I surely must have been previously, but coming back to it after some time it seems quite unclear to me what is meant when combining all these things.

By combining Support and Response or otherwise splitting up Support into what is really meant, it should be clearer. Even better would be to relate it back to the three categories of properties already laid out. For example, last_modified could instead read:

  • Description: Date and time representing when the entry was last modified.
  • Type: timestamp.
  • Requirements/Conventions:
    • Implementation: REQUIRED
    • Value: MAY be null
    • Query: MUST be a queryable property with support for all mandatory filter features.
    • Response: Default (category 2).
  • Example:
    • As part of JSON response format: "2007-04-05T14:30Z" (i.e., encoded as an RFC 3339 Internet Date/Time Format string).

However, I don't like having it say that it is REQUIRED in the implementation, while it MAY be null, so it may be prudent to define that OPTIONAL in the implementation means it MUST be returned if explicitly required via response_fields, but the value MAY be null. If you want, one can even add a Recommended (level of integration in the implementation) that states whether it is REQUIRED, SHOULD, or OPTIONAL to have a non-unknown value, i.e., not null.

CasperWA avatar Feb 05 '20 23:02 CasperWA

I admit that the fuzzy lines are indeed difficult to parse.

The meaning of Support and Response in the specification is intended to be orthogonal. Thinking from scratch, there should be two levels of Support:

  • REQUIRED: property values MUST NOT be nulls;
  • OPTIONAL: property MAY have non-null values.

And three levels of Response:

  • always REQUIRED: MUST be present in the response;
  • REQUIRED unless excluded: MUST be in the response unless response_fields is provided and does not contain this property;
  • REQUIRED if included: MUST be present only if provided in response_fields.

Or am I missing something? Putting it down this way sounds simple to me.

merkys avatar Feb 06 '20 17:02 merkys

  • OPTIONAL: property is MAY have non-null values.

Could just as well be property MAY have null value(s), right?

Or am I missing something? Putting it down this way sounds simple to me.

I think that sums up quite neatly my wall of text, yes :)

This also means that an implementation should put in all allowed properties somewhere, or retrieve them from the spec, so if they choose not to implement support for an OPTIONAL property, they can still return it in the response - but with the value null.

CasperWA avatar Feb 06 '20 17:02 CasperWA

Could just as well be property MAY have null value(s), right?

Right.

This also means that an implementation should put in all allowed properties somewhere, or retrieve them from the spec, so if they choose not to implement support for an OPTIONAL property, they can still return it in the response - but with the value null.

Same here :)

merkys avatar Feb 06 '20 17:02 merkys

there should be two levels of Support:

  • REQUIRED: property values MUST NOT be nulls;
  • OPTIONAL: ~~property is MAY have non-null values.~~ property MAY have null values.

You are omitting:

  • RECOMMENDED: property SHOULD NOT have null values

And three levels of Response:

  • always REQUIRED: MUST be present in the response;
  • REQUIRED unless excluded: MUST be in the response unless response_fields is provided and does not contain this property;
  • REQUIRED if included: MUST be present only if provided in response_fields.

This is different from the present specification in that the latter only RECOMMENDS against including fields in the response that are neither required, nor in response_fields, but it still accepts such behavior as standards-conformant. Your text makes such behavior violate the standard.

Is the distinction between MUST and RECOMMENDED in these two cases useful? It is up to us, but I can definitely argue that most implementations should not break them, but scenarios can be created for when it is motivated.

rartino avatar Feb 11 '20 00:02 rartino

This issue has been marked with the v1.0.0 milestone. My personal interpretation is that this is NOT release-blocking and can be moved to the v1.1 milestone, because it is mostly about clarifying what is in the specification (in a way that makes it easier to read). @CasperWA, do you agree? If so, please change the milestone to v1.1.

rartino avatar Jun 14 '20 22:06 rartino

This issue has been marked with the v1.0.0 milestone. My personal interpretation is that this is NOT release-blocking and can be moved to the v1.1 milestone, because it is mostly about clarifying what is in the specification (in a way that makes it easier to read). @CasperWA, do you agree? If so, please change the milestone to v1.1.

When taking into account the spec changes by @merkys in Materials-Consortia/OPTIMADE#210 it seems that this can indeed be postponed to v1.1. However, this will more clear when we start treating the issue Materials-Consortia/optimade-python-tools#198, i.e., attempting to put the spec to a test in an "implementation". But let's cross that bridge when we get to it, i.e., change this to a v1.1 milestone. What do you think @merkys and @ml-evs?

CasperWA avatar Jun 15 '20 09:06 CasperWA

I'm fine with postponing this issue to v1.1.

merkys avatar Jun 15 '20 17:06 merkys