OSCAL icon indicating copy to clipboard operation
OSCAL copied to clipboard

Add guidance on newlines in tables

Open bradh opened this issue 4 years ago • 9 comments

User Story:

As an OSCAL producer or consumer, I need to know if tables can contain newlines / line breaks, and if so, how to encode / interpret them.

Goals:

Add documentation to explain new line / breaks in tables. The Australian ISM uses this representation:

<table>
	<thead style="background-color: #dbe5f1;">
		<tr>
			<td>
			<p><strong>Regular User Account</strong></p>
			</td>
			<td>
			<p><strong>Unprivileged Administration Account</strong></p>
			</td>
			<td>
			<p><strong>Privileged Administration Account</strong></p>
			</td>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td>
			<p>Unprivileged account</p>
			</td>
			<td>
			<p>Unprivileged account</p>
			</td>
			<td>
			<p>Privileged account</p>
			</td>
		</tr>
		<tr>
			<td>
			<p>Used for web and email access</p>

			<p>Used for day-to-day non-administrative tasks</p>
			</td>
			<td>
			<p>Used for authentication to dedicated administrator workstation</p>

			<p>Used for authentication to jump server(s)</p>
			</td>
			<td>
			<p>Used for performance of administration tasks</p>
			</td>
		</tr>
		<tr>
			<td>
			<p> </p>
			</td>
			<td>
			<p>Different username and passphrase to regular user account</p>
			</td>
			<td>
			<p>Different username and passphrase to regular user account</p>
			</td>
		</tr>
	</tbody>
</table>

My OSCAL for the same content (not from the HTML, but source Word document):

                    <table>
                        <tr>
                            <th>Regular User Account</th>
                            <th>Unprivileged Administration Account</th>
                            <th>Privileged Administration Account</th>
                        </tr>
                        <tr>
                            <td>Unprivileged account</td>
                            <td>Unprivileged account</td>
                            <td>Privileged account</td>
                        </tr>
                        <tr>
                            <td>Used for web and email access
Used for day-to-day non-administrative tasks</td>
                            <td>Used for authentication to dedicated administrator workstation
Used for authentication to jump server(s)</td>
                            <td>Used for performance of administration tasks</td>
                        </tr>
                        <tr>
                            <td></td>
                            <td>Different username and passphrase to regular user account</td>
                            <td>Different username and passphrase to regular user account</td>
                        </tr>
                    </table>

(note the use of literal breaks)

In Markdown:

| Regular User Account                                                           | Unprivileged Administration Account                                                                          | Privileged Administration Account                         |
| ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- |
| Unprivileged account                                                           | Unprivileged account                                                                                         | Privileged account                                        |
| Used for web and email access<br/>Used for day-to-day non-administrative tasks | Used for authentication to dedicated administrator workstation<br/>Used for authentication to jump server(s) | Used for performance of administration tasks              |
|                                                                                | Different username and passphrase to regular user account                                                    | Different username and passphrase to regular user account |

(note the use of <br/> tags)

The conflict here is trying to keep close to the source document (ideally automatically extracting OSCAL from the Word document) while also being OSCAL compliant.

Dependencies:

None identified

Acceptance Criteria

  • [ ] All OSCAL website and readme documentation affected by the changes in this issue have been updated. Changes to the OSCAL website can be made in the docs/content directory of your branch.
  • [ ] A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
  • [ ] The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.

{The items above are general acceptance criteria for all User Stories. Please describe anything else that must be completed for this issue to be considered resolved.}

bradh avatar Nov 29 '19 06:11 bradh

This is tough. Apparently there is no pure-markdown solution, only the escape-into-HTML <br>

So far we have not permitted br even though it occasionally used as a necessary workaround to problems of this sort -- which are due to deeper modeling issues such as what are the proper contents of table cells, and specifically whether that is inline 'soup' or structured, or either, or a mix. HTML has no solution (effectively supporting a mix); Markdown notation for tables implies inline soup, but markdown has no signal for <br/> in this context, hence the escape back into angle brackets.

Having no very clean solution to offer, I am forced to wonder to what extent unstructured tabular data -- with latent structure -- needs to be supported in OSCAL source.

wendellpiez avatar Dec 12 '19 13:12 wendellpiez

More constructively, I agree we need to offer guidance. Not sure what guidance to offer beyond 'don't use tables' which might not be helpful. Sketching the limits here -- OSCAL tables don't support lists or sequences of lines, only single lines -- would potentially be helpful.

This could be a work item for docs, at least.

wendellpiez avatar Dec 12 '19 13:12 wendellpiez

@bradh At the moment, including a <br/> will cause the OSCAL content to not validate.

We are limited by what we can support in Markdown and what can be roundtripped in the conversion from Markdown -> HTML -> Markdown. We might be able to deal with <br/>, but this will make implementation of this conversion process much more difficult.

We need to think about this one.

david-waltermire avatar Jan 09 '20 17:01 david-waltermire

I've raised this before, but am going to make one more push here, since we'll have to live with the decision long term following the official 1.0.0 publication:

I believe Markup-Multiline fields should always be a robust subset of HTML5, regardless of OSCAL format (XML, JSON, or YAML), even if that means our schema validation tools can't validate the content within the markup-multiline fields.

Our OSCAL formats are intended to be read and manipulated by computers, not people. I believe it is fair to have tools insert the appropriate escape characters required for storing HTML5 in JSON. Our converters should be able to add/interpret the escape characters when converting between formats, and otherwise leave the content as-is.

We have clear use cases for robust formatting - including the use of tables in control responses - , and continue to see evidence that while Markdown is sufficient for some organizations, it is not sufficient for others. We are being artificially limited by the use of different markup-multiline formats in the different OSCAL formats.

Or perhaps there can be a property to markup-multiline fields indicating the format (MD or HTML), so that organizations who prefer MD can continue to use it, but organizations who require the more robust HTML formatting are also able to do so. At least that is something we could add a non-breaking change post 1.0.0 delivery. (The absence of such a property causes things to work as they do now.)

brian-ruf avatar Nov 19 '20 18:11 brian-ruf

@brianrufgsa @david-waltermire-nist I wonder if we shouldn't consider an entirely different approach to this requirement. Maybe instead of looking at markup-multiline we should consider permitting embedded HTML through the "any" construct (for which we have already stipulated nominal support, in some places).

<item>
  <title>A1</title>
  <description>
    <p>OSCAL description, and/or ...</p>
    <body xmlns="an-html-namespace"> ... near-HTML goes here ... <br/> ... and here ... </body>
  </description>
</item>

This would not address issues on the JSON side but there might be mitigations we could support, such as letting there be a link to out of line HTML on the JSON side.

Indeed, along those lines, in either XML or JSON OSCAL, we could define a specialized link/@rel as an "include" mechanism, meaning any processor could pick up and expand to include the referenced content at the point of call. (Although I suppose there are now questions of appropriate MIME types, etc.)

<item>
  <title>A1</title>
  <description>
    <link rel="include" href="fragment.html#a1"/>
  </description>
</item>

In my view both these mechanisms (literal HTML inline or using links to reference) would be easier and cleaner than Markdown-extension, for both developers, and organizations that have to take on burdens of rules definition and enforcement to whatever extent OSCAL says "anything goes". While they present problems in JSON/YAML representations, those are no worse than what we face extending the Markdown syntax to support (even some subset of) "office document semantics".

wendellpiez avatar Nov 22 '20 17:11 wendellpiez

Perturbing factors to consider:

  • However we represent new lines or (related but different) paragraphs (such as mix of p ul even nested table) inside td (and th?), it must have a graceful JSON/YAML/Markdown representation (right?) So one question to pose is how JSON or YAML consumers would like this data to look (how does CommonMark do it, etc.).
  • We can also provide alternative strategies for encoding particular use cases --
    • Often a clean structured representation in the data with explicit "tables" only in the representation (the display) is the way to go
    • Or information can be tagged out of line (in an external document) and referenced

wendellpiez avatar Aug 22 '22 15:08 wendellpiez

We follow commonmark as a base specification for markup, which doesn't support tables. We use the GitHub Flavored markup table extension to support tables.

Commonmark supports HTML blocks and inline raw HTML, which can be used to embed HTML in Markdown. The current html datatype support in OSCAL does not support this however.

To move forward we need to either:

  1. Disallow inline HTML in markup.
  2. Allow a subset of HTML in markup (i.e. <br/>) to support newlines and similar use cases.
  3. Allow full support for inline HTML.

Option 1 is easy, requiring no extra work, but limiting functionality.

For options 2 and 3, the XSLT implementation would need to be enhanced to support this. The liboscal-java implementation has support for full inline HTML, but is largely untested so some aspects may not work.

If support for all or a subset of inline HTML is desired, test content will need to be engineered to ensure proper implementation support.

david-waltermire avatar Sep 23 '22 19:09 david-waltermire

In today's model review, there was a pretty active discussion on formatting of prose in OSCAL (specifically, markup-multiline with complex structures around tables/list/whitespace management, but not any registered interest in the particular <br/>/newline in table issue, or any specific use case similar to that. It leaves us open to review the above 3 options as we see fit for that particular use case, barring no other feedback in subsequent comments today or following the model review.

aj-stein-nist avatar Sep 30 '22 18:09 aj-stein-nist

Just noting we should couple any action on this with unit testing of bidirectional conversion of (wrapped and unwrapped) markup-line and markup-multiline.

Indeed due to the nature of Markdown (lack of a grammar) this is really the only way of validating it: converting Markdown to markup (in this case OSCAL XML) with a conformant engine, then converting back, then comparing. (Even this will not be enough for free-form kinds of Markdown.) This implies ensuring conformance, which is where the unit tests come in.

XSpec that can provide a foundation for this was merged with usnistgov/metaschema#218.

wendellpiez avatar Oct 03 '22 15:10 wendellpiez

Given that there are no strong opinions, I believe option #2 is potentially a good way forward to adopt additional HTML tagging over time. I agree with @wendellpiez that round-trip unit testing is needed here. Perhaps we could keep OSCAL as-is for now and explore this more after the OSCAL 1.1 release?

Anyone have feedback on this proposed way forward?

david-waltermire avatar Oct 17 '22 15:10 david-waltermire

+1 to maintain as-is for now.

GaryGapinski avatar Oct 17 '22 16:10 GaryGapinski

@david-waltermire-nist I agree option #2 sounds like the best balance, and agree with @wendellpiez on the need to include any expanded HTML-in-MD tagging in unit testing.

While I can confirm that not having a new-line ability with a table cell will block many FedRAMP SSP from being faithfully converted to OSCAL without re-work of the content, I cannot say how much demand (if any) there is for people actually converting those SSPs at this time.

I suspect this is not yet urgent, but at whatever point we start to see an up-tick in OSCAL adoption among systems with legacy SSPs, it will become urgent. So I think there is time. It would be nice to not wait until OSCAL 2.0 and I believe a sub-set can be implemented a non-breaking change. Just my $0.02 on timing.

brian-comply0 avatar Oct 17 '22 18:10 brian-comply0

Given the feedback above. I think we should create and reference some issues around better testing og markdown <-> HTML conversion and close this for now. Any concerns with this approach?

david-waltermire avatar Nov 02 '22 16:11 david-waltermire