Missing information: publisher codelists
We have a reason but might also want to allow publishers to use short codelists, as with Companies House enumerations.
On the related JIRA issue for the register's bods export, you mentioned a missingInfoCode field. However, I was thinking this might be better as a nested object with a 'reason' and 'description' (or better names) similar to how we structure unspecified relationships? It might help a little to standardise these two connected but separate sections of the standard?
Yes - we moved missing info reasons to a nested structure in 0.2:
https://github.com/openownership/data-standard/blob/3237fd3feee6e63c52b46a9acaf698ae75f41d54/schema/ownership-or-control-statement.json#L108-L139
There is a similar nested object when the exemption or missing data is at entity level.
So I think the question is whether we want a structure like:
reason - a required field drawn from the BODS closed codelist
originalReason - an optional open codelist drawn from the source system that maps to the closed BODS codelist
description - an optional human-readable description, either of the codes in the open codelist or an inferred description of why the data is missing
originalReason gives analysts a quick way to search based on original data.
description is readable, self-documenting but also subject to change (evidence: the Companies House repo) and harder to do analysis on. But we might want to keep it so that we understand some kinks in the data later on.