Standardize validation issues on belgif types
Standardize common non-schemaViolation validation issues for belgif types. Currently CBSS has defined their own types:
- urn:problem-type:cbss:input-validation:invalidSsin (when ssin is not found or has invalid checksum)
- urn:problem-type:cbss:input-validation:replacedSsin (when ssin has been replaced: use new ssin)
- urn:problem-type:cbss:input-validation:canceledSsin (when ssin has been canceled)
- urn:problem-type:cbss:input-validation:invalidPeriod (when period is invalid: startDate after endDate)
- urn:problem-type:cbss:input-validation:invalidEnterpriseNumber (when enterprise number is not found or has invalid checksum)
Should we add the functional domain in their URN, corresponding to the related openapi schema (person-identifier, time, organization-identifier)? E.g.
- urn:problem-type:belgif:input-validation:time:invalidPeriod
- urn:problem-type:belgif:input-validation:person-identifier:replacedSsin
Updated issue type list of CBSS:
urn:problem-type:cbss:input-validation:replacedSsin
urn:problem-type:cbss:input-validation:canceledSsin
urn:problem-type:cbss:input-validation:invalidSsin
urn:problem-type:cbss:input-validation:unknownSsin
urn:problem-type:cbss:input-validation:invalidPeriod
urn:problem-type:cbss:input-validation:invalidIncompleteDate
urn:problem-type:cbss:input-validation:invalidYearMonth
urn:problem-type:cbss:input-validation:invalidEnterpriseNumber
urn:problem-type:cbss:input-validation:invalidEstablishmentUnitNumber
Maybe a single issue type "invalid" suffices for all non-schema input structure validation issues? If the name of the input is returned, the client should be able to know which type it has and its associated rules. The detail property can still provide a specific explanation.
That sounds like a good idea to us.
urn:problem-type:cbss:input-validation:invalidStructure would replace these:
- urn:problem-type:cbss:input-validation:invalidSsin
- urn:problem-type:cbss:input-validation:invalidIncompleteDate
- urn:problem-type:cbss:input-validation:invalidYearMonth
- urn:problem-type:cbss:input-validation:invalidEnterpriseNumber
- urn:problem-type:cbss:input-validation:invalidEstablishmentUnitNumber
... and the same issue can then also be used for other types.
We prefer to keep urn:problem-type:cbss:input-validation:invalidPeriod (end date before start date) separate.
Could we discuss moving these issue types from cbss to belgif namespace on the next workgroup?
ok, added to agenda of next WG
At CBSS we also introduced urn:problem-type:cbss:input-validation:invalidRefData to signal invalid reference data.
e.g. example /socialBenefits?socialBenefitCondition=BAD, where /refData/socialBenefitConditions/BAD does not exist.
At CBSS we also introduced
urn:problem-type:cbss:input-validation:invalidRefDatato signal invalid reference data.e.g. example
/socialBenefits?socialBenefitCondition=BAD, where/refData/socialBenefitConditions/BADdoes not exist.
This could be seen as a specific case of a resource identifier value that doesn't point to an existing resource. For path parameters, there's the 404 resourceNotFound problem, but we could also consider adding a general 'urn:problem-type:belgif:input-validation:resourceNotFound' issue type when identifier values in query params or request bodies don't match a resource.
Having an issue type that can be used both for non-existing ref data and for other non-existing resource references sounds good to us. But reusing resourceNotFound for that can be a bit confusing.
How about urn:problem-type:belgif:input-validation:referencedResourceNotFound?
Sometimes types are shared in multiple operations, but validation rules may be different depending on the situation. e.g. when you want to reuse a resource type for GET and POST, you may want to reject a POST where "id" property is filled in. Conversely, an input that is defined as optional in the OpenAPI definition, may be required in certain situations. e.g. a LocalizedString for which the "nl" and "fr" properties are required in certain situations.
To cover these, we have introduced these issue types at CBSS:
urn:problem-type:cbss:input-validation:rejectedInput: Input is not allowed in this contexturn:problem-type:cbss:input-validation:requiredInput: Input is required in this context
As these urn:problem-type:cbss:input-validation:* issue types are now also present in the Belgif rest-problem library, it would be good if they could be standardized under urn:problem-type:belgif:input-validation:*
- urn:problem-type:belgif:input-validation:invalidStructure = Input value has invalid structure
- urn:problem-type:belgif:input-validation:outOfRange = Input value is out of range
- urn:problem-type:belgif:input-validation:referencedResourceNotFound = Referenced resource not found
- urn:problem-type:belgif:input-validation:rejectedInput = Input is not allowed in this context
- urn:problem-type:belgif:input-validation:requiredInput = Input is required in this context
- urn:problem-type:belgif:input-validation:invalidPeriod = Period is invalid
- urn:problem-type:belgif:input-validation:replacedSsin = SSIN has been replaced, use new SSIN
- urn:problem-type:belgif:input-validation:canceledSsin = SSIN has been canceled
- urn:problem-type:belgif:input-validation:unknownSsin = SSIN does not exist
WG discussion:
invalidStructure e.g. checksum invalidPeriod: bc two inputs (start+end) that may be split in query parameters
outOfRange : e.g. page number more than total of pages Should we return empty collection response "200" instead of a problem? => split to new issue. Scenarios:
- concurrency - deleted items in between page consults
- no total number of pages given in previous response, jump to later page immediately
referencedResourceNotFound: when performing a request to a resource with an identifier to another but it doesn't exist => unknownSsin is a specific occurrence of this one: can't it be generalized within referencedResourceNotFound?
replacedSsin, canceledSssin:
- should it be in belgif or CBSS-specific? ssin is in belgif-person-identifier as well (but not REST guide), would it be reusable in same level?
Should issue types related to business-specific schemas be documented in REST guide, or somewhere in fedvoc artifacts or next to the business-specific openapi schemas; but less easy to find them there? Naming convention of these issue types: include the ontology (= business domain) in the issue identifiers?
rejectedInput,requiredInput: permissive schemas that are reused in multiple operations, but should be more restricted according to the specific operation.
referencedResourceNotFound: when performing a request to a resource with an identifier to another but it doesn't exist => unknownSsin is a specific occurrence of this one: can't it be generalized within referencedResourceNotFound?
CBSS RestDesignTeam agrees on using general referencedResourceNotFound issue type instead of unknownSsin.
outOfRange : e.g. page number more than total of pages Should we return empty collection response "200" instead of a problem? => split to new issue. Scenarios:
- concurrency - deleted items in between page consults
- no total number of pages given in previous response, jump to later page immediately
Maybe an HTTP 400 Bad Request is indeed not the right approach when requesting a page beyond the number of available pages.
But doesn't urn:problem-type:belgif:input-validation:outOfRange seem generally useful regardless?
e.g. for a date that cannot be in the past / future / ... and other runtime checks for dynamically determined ranges?
I've tried to summarize the discussion in https://github.com/belgif/rest-guide/wiki/Input-validation-issue-types
IMO, the issue types referencedResourceNotFound, replacedSsin and canceledSsin would require the least discussion to standardize. For rest-problem-java, we'll have to discuss whether or how to include unstandardized issue types before releasing 1.0.
The page https://github.com/belgif/rest-guide/wiki/Input-validation-issue-types was briefly presented on the WG. Feedback welcome, discussion will continue next WG meeting.
Some feedback from CBSS on https://github.com/belgif/rest-guide/wiki/Input-validation-issue-types:
Proposed new issue types
To no suprise, we endorse the proposed issue types, as we already use them as internal standard at CBSS.
Other: payload is not valid to content-type (e.g. no valid JSON payload for content-type application/json)
-> doesn't that fall under urn:problem-type:belgif:input-validation:schemaViolation?
Issue types specific to a business concept
For the type definitions that are standardized under Belgif (https://www.belgif.be/specification/rest/api-guide/#rule-oas-comdef), we believe it makes sense for the issue types related to those definitions to also be standardized under Belgif. But the task of identifying, naming and describing domain-specific issue types (e.g. "replacedSsin" or rather "mergedSsin", or just "canceledSsin" with new SSIN as info) is probably best suited for the FedVoc working group.
Bloating the https://www.belgif.be/specification/rest/api-guide/ too much is indeed a concern we share. We think the changes to the main document can be limited to listing the standardized issue types in a table under https://www.belgif.be/specification/rest/api-guide/#bad-request. The actual documentation and examples for each issue type can be extracted, so we have for example: https://www.belgif.be/specification/rest/api-guide/issues/invalidStructure.html (this URL would also be used in the href property of such issue)
We like the idea of adding the sub-domain to the issue type (it should correspond to the Belgif artifact in which the related component is located):
urn:problem-type:belgif:input-validation:person-identifier:canceledSsin
Level of detail in issue types
We don't think we should impose that clients must always do input validation themselves. Some validations are impossible to perform on the client, and sometimes it may be undesirable to duplicate all validation logic on the client (e.g. in dedicated backend-for-frontend).
At the same time, we don't want clients to parse the free text title/detail message. So in our opinion we must try to communicate the details of failed input validations in a structured way to the client.
Having the most common input validation issue types standardized, prevents organisations from having to continuously invent their own issue types. They CAN be used in the scenarios where it makes sense. As long as it's not stated otherwise in the REST guide, it's not a MUST.
Issue type depending on implementation details
API gateways, editors, validators, generators, ... are slow to catch up with OpenAPI spec evolutions. So at least for now there is still a need for issue types for validations that cannot be expressed through the schema. Indeed, it's not correct to return "schemaViolation" already for something that is not yet expressed in the OpenAPI. At one point, those issue types could be deprecated in favor of enforcing them in the schema and returning urn:problem-type:belgif:input-validation:schemaViolation.
Use of rejectedInput/requiredInput could indeed mostly be avoided by designing dedicated schemas for each operation. But that won't always be possible/desirable, so the need for these issue types is still there.
YearMonth: indeed, fine by us to change the pattern to ^[0-9]{4}-(0[1-9]|1[0-2])$.
Other: payload is not valid to content-type (e.g. no valid JSON payload for content-type application/json) -> doesn't that fall under
urn:problem-type:belgif:input-validation:schemaViolation?
You don't need to have the OpenAPI (and validate against it) to check if payload is valid wrt its specified content-type http header. For binaries, it can apply as well: e.g. image/jpeg but content is a PNG.
We like the idea of adding the sub-domain to the issue type (it should correspond to the Belgif artifact in which the related component is located):
urn:problem-type:belgif:input-validation:person-identifier:canceledSsin
The subdomain identifier is a bit of a special case here: it isn't aligned with a separate business domain (there's just the Person and Organization ontologies in FedVoc), but it was decided to split it because the identifiers are often used without other domain data.
We'll have to make sure not to confound validation on the entity with validation on the identifier only; e.g. in the stopped or merged organizations case.
Level of detail in issue types
Having the most common input validation issue types standardized, prevents organisations from having to continuously invent their own issue types. They CAN be used in the scenarios where it makes sense. As long as it's not stated otherwise in the REST guide, it's not a MUST.
Maybe some middle ground can be found, e.g. belgif:validation-ext issue types that MAY be used (instead of SHOULD) and for which we don't provide much forward stability guarantees to avoid all too lengthy up-front standardization efforts.
Issue type depending on implementation details
It's usually possible to make quite some changes in the OpenAPI (e.g. schema names, going from OpenAPI 2.0 to 3.0), without breaking runtime compatibility. If more detailed issue types and validation behavior would be specified and thereby may be used by the client application, each future change would require a new major version of the API itself instead of only changing the OpenAPI document, making the web service more difficult to evolve.
Maybe some middle ground can be found, e.g.
belgif:validation-extissue types that MAY be used (instead of SHOULD) and for which we don't provide much forward stability guarantees to avoid all too lengthy up-front standardization efforts.
Or belgif-ext:validation? So this general mechanism could also be used for other problem / issue types?
Maybe some middle ground can be found, e.g.
belgif:validation-extissue types that MAY be used (instead of SHOULD) and for which we don't provide much forward stability guarantees to avoid all too lengthy up-front standardization efforts.Or
belgif-ext:validation? So this general mechanism could also be used for other problem / issue types?
I was thinking of the belgif part the organization name like in [prb-type], and "validation extensions" as a domain/grouping of additional validation issue types maintained by belgif.
Probably doesn't matter all that much, both choices could be applied more generally, and the naming is meant for readability rather than parsing.
WG agrees upon adding referencedResourceNotFound
Only generic issue types should be in REST guide
Issue types specific to business domains, should be aligned with their openapi modules and managed outside the REST guide. TODO: propose a way of managing them (e.g. add asciidoc in each openapi project), and governance with FedVoc WG.
WG conclusions:
- keep schemaViolation separate from other non-schema programmatic validation issues
- new issue type that covers broadly programmatic input validation issues, for which a client is expected to provide no specific handling according to more specific issue cause
- These include cases: checksum invalid, period end date before begin date, combination of parameters. But excluded canceled SSIN, stopped enterprise, referencedResourceNotFound, ...
- a more specific issue 'detail' value helps programmers to troubleshoot, but is not meant to be parsed.
- APIs can still provide their own issue types for cases which typically require specific handling by client applications
WG decided on issue name: invalidInput
Even though it's somewhat redundant with "input-validation" in the URN prefix and a bit vague, the name is understandable and more specific names under consideration were found to be more confusing (e.g. anomaly).
Next steps:
- create PR, update wiki
- update REST problem library