OSCAL icon indicating copy to clipboard operation
OSCAL copied to clipboard

Tracking control identity via catalog of origin

Open wendellpiez opened this issue 4 years ago • 23 comments

User Story:

In specifying and implementing profile resolution (#508, #509) we have exposed a requirement to support tracking 'control identity' more robustly through profile resolution. This is so that when profiles import profiles, the controls they import can be correctly matched with controls from other import pathways (such as source catalogs or other profiles of the same source catalogs).

A simple design extension, with two new flags (one each for control and catalog) could address this.

Details

A profile, call it profileX:

<profile id="AAA000">
  <import href="#profileY">
    <include>
      <call control-id="a1"/>
    </include>
  </import>
  <import href="#catalogZ"/>
    <include>
      <all/>
    </include>
  </import>
</profile>

source profileY:

<profile id="abc789">
  <import href="#catalogZ">
    <include>
      <call control-id="a1"/>
    </include>
  <import>
  <modify>
    <alter control-id="a1">
      <add position="starting">
        <prop name="status">NEW</prop>
      </add>
    </import>
</profile>

source catalogZ:

<catalog id="xyz123">
  <control id="a1">...</control>
  <control id="a2">...</control>
  <control id="a3">...</control>
</catalog>

Note that profileX selects control a1 twice: once in modified form (from profileY) and once in its original form in CatalogZ.

There are three "combination rules" for merging: "keep", "use-first" and "merge". Keep is easy - it says, keep both a1 controls and detect the clash downstream. This option is presumably most useful for dev/testing, although if a profile is written correctly there is no error, hence no harm in it. (In this case, the second import could exclude the control as it isn't actually wanted from the catalog.)

But how do we determine that the two 'a1' controls are the same for purposes of the "use-first" and "merge" options? This can be dramatized by examining the catalog that results from resolving the imported ProfileY:

<catalog id="abc789-RESOLVED">
  <control id="a1">
    ... <prop name="status">NEW</prop>  ...
  </control>
</catalog>

When we combine this with catalogZ, we have no way of knowing here that control 'a1' originated from the same catalog (with id="xyz123"), and is not some totally different source.

This problem is compounded by the likelihood that a catalog @id does not persist across different released/revised/published versions of the same catalog, so it is not reliable as a disambiguator.

Proposal

  1. Add a catalog/@canonical-id flag to enable marking a catalog for 'persistent identity' across versions.
  2. Permit controls to carry a flag showing their catalog origin, for controls to carry in intermediate catalogs.

If our original source catalogZ has:

<catalog id="xyz123" canonical-id="ZZZZZ">
  <control id="a1">...</control>
  <control id="a2">...</control>
  <control id="a3">...</control>
</catalog>

By propagating the value of the catalog's canonical-id to the controls, the results of resolving ProfileY could look like this:

<catalog id="abc789-RESOLVED">
  <control id="a1" origin-id="ZZZZZ">
    ... <prop name="status">NEW</prop> ...
  </control>
</catalog>

Now the fact that 'a1' derives from catalogZ in both cases, can be determined (a value of ZZZZZ as an @origin-id or on an ancestor catalog/@canonical-id), and a resolution of ProfileX can look like this (assuming the merge method 'use first' is applied):

<catalog id="AAA000-RESOLVED">
  <control id="a1" origin-id="ZZZZZ">
    ... <prop name="status">NEW</prop> ...
  </control>
  <control id="a2" origin-id="ZZZZZ">...</control>
  <control id="a3" origin-id="ZZZZZ">...</control>
</catalog>

The origin-id attribute would then persist -- as tracking the catalog of origin, it is not rewritten by subsequent profile resolution steps as it might be needed any time down stream.

The same issue arises with groups for merging purposes under merge/as-is.

NOTE:

This design permits correct use-first or merging behavior, but it does not rewrite IDs; thus if controls with colliding IDs are imported from two different sources, they will (correct;u) not be merged and validation errors will presumably result. So it is still necessary to see to it that IDs do not clash between catalogs to be combined.

Further note:

We could also specify operations on document metadata and/or back matter, to track how a profile is made. So the metadata of a result catalog (even if only in memory as a profile is resolved) will say something about its sources, and upstream catalogs could be referenced.

Goals:

  • A solution is designed and tested to permit implementation of the 'use-first' and 'merge' combination rules for merging in profile resolution.
  • Any new structures (such as new flags) required for this solution, are deployed to appropriate schemas
  • Unit testing for profile resolution utilities reflect the defined semantics and provide tests for validating their application by processors.

Dependencies:

Linked to both #508 and #509.

The solution should also be unit tested.

Acceptance Criteria

  • [ ] All OSCAL website and readme documentation affected by the changes in this issue have been updated. Changes to the OSCAL website can be made in the docs/content directory of your branch.
  • [ ] A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
  • [ ] The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.

wendellpiez avatar Nov 20 '19 18:11 wendellpiez

First, I would encourage - as a best practice among profile creation tools - all conflicts be called to the attention of the profile creator at the time of creation, with the intention of deconflicting the profile's control references before it is ever processed for resolution.

I like the @canonical-id in theory, but am am concerned that this requires catalog creators to have another ID to manage for their catalog, and that management would need to be under a clear set of guidelines. For example, after NIST SP 800-53r4 was released, there was an update with (mostly) corrections and a few tweaks released at some point during its first year. Most of the controls did not change at all. A few did.

If they have the same @canonical-id value, we would fail to correctly process the few controls that changed. If the @canonical-id changed, we would fail to correctly process the majority of controls that did not change.

Also, we would have to make the @canonical-id required or it will not be present for a given catalog. We may need to require it to be unique as well. At that point, we should just use the UUID instead, if we are going to take this approach.

However, I would like to suggest an alternate approach:

  • Ambiguous control references are treated as erroneous and not allowed within a profile.
  • Instead of a @canonical-id at the root of a catalog, add an optional @id to the profile import field, and @import-id to the profile alter field.
  • In the case of conflicting control IDs, use the @import-id in the alter assembly to indicate which version of a control is being modified.
  • When resolving a profile, the value of the @import-id could become the @ class of the control.

This would allow the resulting catalog to unambiguously trace back to the control's origin without reliance on potentially missing or miss-leading @canonical-id's from a catalog. It also puts the responsibility on the profile creator to explicitly handle any conflicts.

Sadly, while I like the idea of directives that indicate accepting the first or last, I think that only works for XML as JSON processing cannot be trusted to process import statements in their original sequence.

brian-ruf avatar Nov 20 '19 23:11 brian-ruf

As for how conflicts should be managed by tools, whether while authoring a profile, or resolving one, I see the following functional steps (which ignore current tool capabilities for the sake of this discussion):

BARE MINIMUM TARGET

  1. Catch any duplicate IDs
  2. Warn the user of any conflicting IDs
  3. Treat as not compliant with OSCAL until the conflicts are addressed.

DESIRED MINIMUM TARGET

  1. Catch any duplicate IDs
  2. Perform a comparison of all duplicated IDs.
  3. Silently drop exact duplicates
  4. Warn the user of any conflicting IDs that are not exact duplicates
  5. Show the differences
  6. Treat as not compliant with OSCAL until the conflicts are addressed.

ADVANCED TARGET

  1. Catch any duplicate IDs
  2. Perform a comparison of all duplicated IDs.
  3. Silently drop exact duplicates
  4. Warn the user of any conflicting IDs that are not exact duplicates
  5. Assess the differences
  6. Show user the differences, and:
    • State the degree of differences cited;
    • Recommended deconfliction actions; and,
    • Automatically take recommended deconfliction actions if approved by the user.
  7. Treat as not compliant with OSCAL until the conflicts are addressed.

For all of the above, if we adopt either the @canonical-id approach suggested by @wendellpiez, or the @import-id approach suggested in my post above (or both), those could be valid methods of addressing duplication.

My main goal is that a human should always be responsible for explicitly addressing duplicate control references. There is too much effort that goes into satisfying controls to risk an unintended de-duplication action by a processing tool.

brian-ruf avatar Nov 20 '19 23:11 brian-ruf

As an observer, this issue/feature seems overly complicated. IRL users can just point to a git (subversion etc) version/release. I don't think it's helpful for OSCAL to worry about a solved/outsource-able problem.

Version management should not be part of the schema (outside of a UUID or timestamped hash), it should be an updatable only via ref to a url and have a .lock concept, I'd be afraid for it to handle anything more than that.

JJediny avatar Nov 21 '19 01:11 JJediny

@JJediny this issue is about how the syntax in an OSCAL profile is interpreted to create a new catalog. It is about a specific issue in that process where the OSCAL profile specifies the same control (by OSCAL control ID) more than once, creating a conflict.

First, I am unaware of how GitHub could help with this within OSCAL XML or JSON content, and would welcome a proof-of-concept from you as it would save us having to build tools.

Also, OSCAL is intended to be used both within Internet-connected environments, as well as "offline", environments, such as may be required for classified processing, where there would be no access to a public site such as GitHub.

brian-ruf avatar Nov 21 '19 01:11 brian-ruf

if this is about conflicts in the namespace, I'm confused why versioning is a topic?

JJediny avatar Nov 21 '19 01:11 JJediny

@JJediny, sorry for any confusion. While @wendellpiez does use the word version a couple times, he does so more loosely. Version may not be the best word, as it's not a versioning topic, per se.

It's more about a profile resolving multiple controls with the same control ID (from different import sources) and determining whether they are just duplicates, or something that started out as a duplicate, but was changed by an upstream profile, or something from a different catalog that just happens to have the same control ID.

brian-ruf avatar Nov 21 '19 01:11 brian-ruf

Sorry, I admit I was well offbase with my original understanding of this issue, but which I now (think?) I understand it to be "if the standard e.g. NIST 800-53/SOC2/ISO 27001" has conflicting namespace whether across standards or versions-of-themselves there is an issue?

If so, it seems to warrant a canonical index of the control-id namespace. IMHO I would be very opposed to deconflicting this at the user writing up their SSP level via modifications no? as control-ids should be a canonical and globally established?

It would make it cleaner/easier if each said framework published its controls with ids as versions over making that apart of the schema/user level to modify.

JJediny avatar Nov 21 '19 01:11 JJediny

@brianrufgsa, import statements in the JSON are in an array so that order can be respected. This is one of the features of Metaschema, that it ensures predictability of ordering even in the JSON. (Mostly: there are exceptions around the edges.)

Mostly, I really like the thinking/discussion here. Although I can also see the issue is apparently complex enough that we are probably going to need mockups/demos.

wendellpiez avatar Nov 21 '19 13:11 wendellpiez

@JJediny so @brianrufgsa is correct; I should probably have used a different word, maybe 'variant'. You are right though that this is effectively a persistent namespace for control identification. And yes, a canonical index would be a huge help, maybe essential at some level. I also agree with the design goal of solving this upstream from users.

wendellpiez avatar Nov 21 '19 14:11 wendellpiez

Regarding the global scope of a canonical-id value, there are different ways that could be approached ... that could be its own conversation.

wendellpiez avatar Nov 21 '19 14:11 wendellpiez

@JJediny I realize the whole point of OSCAL is to be as machine-readable as possible, thus we want to automate our activities as much as possible, including de-conflicting of controls during an import.

Here is why I assert a human should always have final responsibility for control de-confliction:

  • Control definitions are essentially functional requirements.
  • A great deal of resources are expended by an organization to satisfy each and every control (functional requirement).
  • Based on several studies, the cost of resolving a requirement error (incorrect automated deconfliction action) grows nearly exponentially for each stage of development (1x when defining the requirement, 3x-10x when designing to it, 10x-100x when building to it, 1,000x - 40,000x after deployment).
  • Due to this, it is critical to be absolutely accurate and clear about every control definition (functional requirement) as early as possible.

Conflicting requirement IDs represent an ambiguity of functional requirements. Machines are not yet smart enough to intelligently accurately de-conflit such ambiguities using judgement and reasoning, They can detect and address exact duplicates. They can assess the degree of difference for non-exact duplicates. They can even recommend changes. They should do everything they can to enable a human to understand the conflict, present options, and take action once a human has rendered a decision. But ultimately, a human needs to "own" the final deconfliction decisions.

brian-ruf avatar Nov 21 '19 14:11 brian-ruf

The different merge behaviors are designed to enable more and less assertive resolutions of conflicts. More assertive resolutions have the advantage of succeeding in producing valid catalogs for a wider range of inputs including (nominally) ambiguous inputs (given a way to resolve such ambiguities). Less assertive resolutions have the advantage of exposing problems in profiles rather than resolving them. This is good when a better solution to such a problem is easily found upstream. (Fix the input so there is no clash to resolve.)

combine with method='keep' is the least assertive combination rule; it will expose the problems by refusing to pick and choose among contenders, instead pushing them all out to "fight amongst themselves" in the output.

combine with method='merge' is the most assertive combination rule: it paves over conflicts by merging the controls detected to be in conflict. I can imagine this being useful as a feature but as a profile author I'm not sure I'd actually want to use it. (If I have controls to combine maybe I'd prefer to write a new control with my edits.)

combine with method='use-first' could be useful for quick-and-dirty profiles (should there be such a thing?) or profiles under development; it also makes it a little easier to use the convenient include/all feature.

The question of control identity - recognizing that AC-2 in one import actually clashes with another -- is essential to the 'merge' and 'use-first' methods, but not the 'keep' method, which amounts to straight up GIGO.

So one solution could be to remove the merge options besides 'keep', and let the devil take the hindmost. Instead of supporting any merging of controls, we would rely on tooling and perhaps mandatory error reporting, in resolution, to help authors deconflict the control imports.

Even a canonical-id or "namespacing" mechanism has a problem, however, when two different catalogs have controls with clashing identifiers. Addressing that, unfortunately, implies a feature for reassigning ID values in the result. Which would open another can of worms.

wendellpiez avatar Nov 21 '19 16:11 wendellpiez

Noting today that this issue applies also to parameters (which, like controls, have the potential to clash with other parameters with the same ID, on multiple imports) and potentially to groups.

wendellpiez avatar Nov 26 '19 19:11 wendellpiez

I think its possibly more basic than a merge need. A control-id isn't unique (in general). The Australian ISM has controls that look like:

Security Control: 0181; Revision: 2; Updated: Sep-18; Applicability: O, P, S, TS Cables are installed in accordance with the relevant Australian Standards, as directed by the Australian Communications and Media Authority (ACMA).

That control identifier (0181) is obviously only unique within the Australian ISM. So when I'm writing a component (or SSP section) that addresses it, I need to have some context information that says "the 0181 I mean is the Australian ISM one". Further, the ISM will revise it (hence the "Revision: 2" part), without changing the control identifier. So compliance linkage / traceability isn't against a control identifier, its against a triple of namespace-control-version.

bradh avatar Nov 27 '19 22:11 bradh

Agreeing with @bradh. This is more general than simply merging; it is also about addressability from higher layers.

My main question at this point is whether a top-level catalog/@canonical-id, along with the control's ID (governed within the scope of the catalog), will suffice. This is basically namespace-control-version assuming that the canonical-id can provide the scope (namespace-version) part of that.

wendellpiez avatar Dec 02 '19 16:12 wendellpiez

@wendellpiez I can see why that would work OK for a NIST 800-53 style approach, where controls are versioned at the document level. However my control source versions them at the control level, and the document is revised many times (I think 8 times in 2019), although most controls are unchanged from document revision to document revision.

Chasing document revisions is part of the problem I'm hoping to address, so I'd prefer to have version as a separate attribute.

I could (potentially) have it at the control level (basically incorporating revision in, so the id might be 0181-r4. The downside to that is that if the changes from r4 to r5 are pretty minor, there is probably a good chance that existing component parts relevant to 0181-r4 are applicable to 0181-r5, but I need tools to be able to take apart the id scheme to help me with the transition, rather than just treating the id as opaque.

bradh avatar Dec 02 '19 21:12 bradh

@bradh Would it be possible to version the document at the document level, but add a property at the control level that indicates which document version the control was last updated by?

david-waltermire avatar Dec 02 '19 21:12 david-waltermire

I do have last-changed and revision properties in my controls: https://github.com/bradh/ism-oscal/blob/master/Australian_Government_Information_Security_Manual_NOV19_catalog.xml#L57 for an example. That is obviously non-standard though.

bradh avatar Dec 02 '19 21:12 bradh

"Standards" are relative and many-layered. If implemented consistently and documented, a solution like this can have the effect of a standard for a user community. I'd like to promote consistent extension-by-restriction whenever possible, as it keeps the baseline schema simpler for everyone. (Moving the problem.)

That's actually not an absolute 'no' fwiw from me as I also feel we need to keep an eye on these things. If everyone has the same need we don't want a dozen ways to do it, either. That's the flip side.

wendellpiez avatar Dec 03 '19 14:12 wendellpiez

@wendellpiez Has this issue been fully addressed in PR #559?

david-waltermire avatar Feb 07 '20 19:02 david-waltermire

@david-waltermire-nist no this has not been fully addressed, but remains an issue to be worked out in testing.

Ultimately we should have unit tests showing pathological inputs so we can detect both intended and unintended control (identity) collisions across imported catalog(s) and/or profile(s).

For now this is a tracking issue for this problem, with notes for possible approaches.

wendellpiez avatar Feb 10 '20 20:02 wendellpiez

This relates to mapping in profile resolution. #843 and #1115

david-waltermire avatar Mar 04 '22 18:03 david-waltermire

Implemented a means to support this in liboscal-java v1.0.4 through use of an identifier mapper.

This needs to be added to the Profile Resolution specification (see #1196).

david-waltermire avatar Jul 01 '22 11:07 david-waltermire