OSCAL
OSCAL copied to clipboard
Does metadata keywords ignore trailing spaces?
Question
The description of the keywords property defines the value as a "comma-separated listing of keywords" without explicitly stating how leading & trailing whitespace should be handled. While one likely would assume that whitespace should be ignored in these circumstances, I wanted to ask if there was a preference or convention among existing documents.
It seems like in both the XML and JSON representations of the NIST_SP-800-53_rev5_catalog the keywords prop contains a space after the comma and before each word, generally suggesting that the whitespace should be ignored.
"value": "Assessment, assessment plan, assurance, availability, computer security, confidentiality, control"
Is there any specific guidance on whether whitespace should be ignored or preserved?
Is there any specific guidance on whether whitespace should be ignored or preserved?
I quickly skimmed the profile resolution spec (even though that is not the only context it seemed relevant here) and other possible specs for our models and I didn't find an immediate answer. More to follow, and thanks for asking this!
@aj-stein-nist ideally there is existing guidance; however, if there is not, see several possible options for handling this, such as one of the following:
-
Ignore all whitespace around the commas, including spaces, non-breaking spaces, tabs, carriage returns, line feeds.
-
Require each key word/phrase be enclosed in single quotes in addition to being separated by commas. Only the content within the single quotes is valid. Outside the single quotes, commas are honored as the separator and all other content is ignored.
-
Instead of one
keywordsproperty with a long list, there could be severalkeyword(no "s") properties. Each with a key word or phrase. This would eliminate the ambiguity to software developers.
@aj-stein-nist ideally there is existing guidance; however, if there is not, see several possible options for handling this, such as one of the following:
* Ignore all whitespace around the commas, including spaces, non-breaking spaces, tabs, carriage returns, line feeds. * Require each key word/phrase be enclosed in single quotes in addition to being separated by commas. Only the content within the single quotes is valid. Outside the single quotes, commas are honored as the separator and all other content is ignored. * Instead of one `keywords` property with a long list, there could be several `keyword` (no "s") properties. Each with a key word or phrase. This would eliminate the ambiguity to software developers.
Re our updated ADR on process, we will likely discuss this in this week's issue triage and backlog refinement meeting. If not, in one soon after. Thanks for your comments and feedback. The team should consult on existing guidance (I did not quickly find any last I checked) and barring that what do implementations do.
Hi @tuckerzp, thanks for your patience. To answer your immediate question: no, there is no specification for white space processing of a prop's @value or its @name for that matter. They are just strings, as they are defined for prop all over the respective models, and even elsewhere not in prop. This is true of any model element that is "string" or does not have an explicit type in the Metaschema modules.
That said, it appears there are better ways this instance of a property for keyword is generated in the NIST catalog. For awareness, the keywords for the NIST SP 800-53 document are part of a NIST publication requirement for special publications, SP 800-53 is one of them, and that field is prop (that isn't mandatory) to store and help render PDFs later. This specific prop in this specific location is less about machine-readable reuse than it appears.
Given that, is there a particular need to define the whitespace processing for this prop given a scenario you have in an application you are making? All props? @brian-easyd makes some recommendations I can agree with (separate of the question, they are worth considering). When we hear more about the challenges it poses for you as-is, aside from answering the question, we can communicate expectations about how you would use it as-is today or how it could be changed to one of the alternatives Brian proposed. Sound fair?
Please let us know about your scenario and the challenges so we can prioritize the work, thanks in advance.
There have been no explicit multi-valued prop values defined (as opposed to pushed onto the stage) to date. prop has a very simple value type at this time. The only obvious way to accommodate such is to have an attribute which allows a multi-valued value to be tokenized (in XML as well as JSON while surviving serialization in both). As mentioned, value is just a string. Tokenizing on ,\s* is an arbitrary convention, not a definition (and prop lacks the capability to express such a definition — @class and @ns are insufficient). Were there such an attribute, the regex for tokenization would be useful. That presumes a regex scheme suitable for XML and JSON.
I agree with @GaryGapinski on the point about there being no other explicit multi-valued prop values; however, there is precedence in OSCAL for using multiple prop fields with the same name and distinct values for situations like this.
For example, an inventory-item with multiple IP addresses use multiple prop fields with the same name ("ipv4-address"). Each with a distinct IP address in the value attribute.
I believe this is the best way to handle key words as well. A series of prop fields in the metadata, each with @name='keyword' (singular) and each with a distinct keyword value.
This also avoids refactoring the overall syntax for prop, which could have a large impact to developers at this point.
I believe this is the best way to handle key words as well. A series of prop fields in the metadata, each with
name='keyword'(singular) and each with a distinct keywordvalue. This also avoids refactoring the overall syntax for prop, which could have a large impact to developers at this point.
I agree (particularly the benefit that it does not require any schema change). I think that implies that such a set of props would end up being a map to an array in JSON, e.g.
{
"keyword": [
"kw1",
"kw2",
"kw3",
"etc. etc."
]
}
@brian-easyd and @GaryGapinski, we all seem to be on the same page about alternative ways of handling that specific prop. Re the more general case, I still want to hear feedback from Zach on the following to prioritize and scope this work.
Given that, is there a particular need to define the whitespace processing for this prop given a scenario you have in an application you are making? All props? @brian-easyd makes some recommendations I can agree with (separate of the question, they are worth considering). When we hear more about the challenges it poses for you as-is, aside from answering the question, we can communicate expectations about how you would use it as-is today or how it could be changed to one of the alternatives Brian proposed. Sound fair?
For values (in metadata or elsewhere) that cannot contain spaces, constraining beyond string to token (Metaschema data value types) can help with this. For values that can contain spaces things are a little more difficult. The key is to define a normalization rule that can apply on any input - for example, trimming leading and trailing whitespace with a given definition of whitespace; or doing both these and also collapsing runs of space. Given such a rule, both conversions/normalizations, and validations (detecting un-normalized forms) are feasible. However, its scope of application must also be defined correctly, and "all props" or even "all props in metadata" are both probably too broad.
Be this as it may the solution is the same: external definition of constraints over data elements in OSCAL, and an example of how to do this in a local application (wrt metadata properties in particular, maybe).