WritingStyleGuide icon indicating copy to clipboard operation
WritingStyleGuide copied to clipboard

Investigate making Usage Dictionary parseable

Open daobrien opened this issue 5 years ago • 17 comments

If we formalize the structure of the Usage Dictionary and make it parseable, we can use it with Pre-commit and run tests over content to help with adherence to the guidelines. @[email protected] [email protected]

daobrien avatar Jan 29 '20 06:01 daobrien

Initial thoughts: 1: [word]: correct usage, don't use [incorrect-usage] 2: [word]: don't use. Instead use [correct usage] 3: [word]: Explanation/ definition of the word.

Tables and lists are easy to parse as long as the structure is consistent. Is there any preferred business rule? If not, may I suggest the following structure

[word] [correct|incorrect] [proper word to use instead] [optional explanation]

That would allow to have the parser grab the initial item, determine whether it is correct based on the second item. Then if it is incorrect, the parser can grab the second item. The third item is always something that will be excluded from the parser.

Another suggestion is simply to add some tagging to the first item, which would allow the parser to determine the correctness of the word (like an item property).

daobrien avatar Jan 29 '20 06:01 daobrien

How should we deal with entries that contain multiple terms? e.g., https://stylepedia.net/style/#b Cases exist where [word] exists as a noun and adjective but not as a verb, etc.

In some cases a bit of a rewrite of the entry will be required, but that's ok because I definitely see the ROI. I'll have a look into what DocBook attributes are available. I'm disinclined to use tables. Remember that the Usage Dictionary is, in fact, a series of <variablelist> elements, with each dictionary character (A, B, C...) being its own chapter.

daobrien avatar Jan 29 '20 06:01 daobrien

Option 3 in the OP also needs to be considered for terms such as:

continuous delivery (CD)
A software implementation architecture that ensures all approved code can be easily 
pushed to production. 

⁠continuous deployment
A special case of continuous delivery, where approved code is automatically pushed 
to production. Do not use "CD" to refer to this practice. 

⁠continuous integration (CI)
A software development architecture where the developer code branch is synchronized 
with the main code branch or master multiple times per day. Development always works 
with the current code base. 

There is no "right/wrong" paradigm here, but rather an explanation of the terms to help authors know what they really mean.

daobrien avatar Jan 31 '20 01:01 daobrien

How about something like this?

<varlistentry id="air-gap">
  <term role="true">air gap</term>
  <term role="false">air wall</term>
  <listitem>
    <para>
      <emphasis>n.</emphasis> Use "air gap" to describe systems that are separated, not by software, but physically. Do not use "air wall." "Air gap" is preferred in technical publications because there is no actual wall that you need to breach, but rather a gap that you need to bridge. You cannot break through something that does not exist.
    </para>
  </listitem>
</varlistentry>

This renders as follows: styleguide-issue-161

daobrien avatar Feb 04 '20 05:02 daobrien

Hey David, thanks for starting the discussion here.

I had a brief discussion with Dan. We need to make sure that the structure will have a consistent representation of the terms that are deemed valid or invalid, but also that pre-commit will be able to determine whether the user input has been found in one of more variants of the word.

A couple of open questions:

  • Are there cases where more than one word are acceptable? If so, how do we approach that?
  • Is the list only comprised of correct and incorrect terms, or are there entries that simply provide an explanation of a term, without it being incorrect?
  • Regarding the "do not use", are those meant to be prescriptive? Are there cases where "do not use" can be used (assuming no substitute is given)?

@dkolepp feel free to add your suggestions/thoughts :)

rmahroua avatar Feb 04 '20 18:02 rmahroua

Hey David, thanks for starting the discussion here.

I had a brief discussion with Dan. We need to make sure that the structure will have a consistent representation of the terms that are deemed valid or invalid, but also that pre-commit will be able to determine whether the user input has been found in one of more variants of the word.

It's not completely consistent atm, no, but I'm working to remedy that.

A couple of open questions:

  • Are there cases where more than one word are acceptable? If so, how do we approach that?

Yes. For any such entry we can use <term role="true"> for each word. We try to avoid this and instead adhere to the "one word, one meaning" philosophy.

  • Is the list only comprised of correct and incorrect terms, or are there entries that simply provide an explanation of a term, without it being incorrect?

Some entries are comprised of just "correct/incorrect" and others are more elaborate and extend to definitions or explanations. Some entries are actually quite old and could be removed.

  • Regarding the "do not use", are those meant to be prescriptive? Are there cases where "do not use" can be used (assuming no substitute is given)?

"Do not use" means exactly that. I don't think we have any entries that do not also provide alternatives.

Remember that this Style Guide has been around for quite a while and hasn't always been updated consistently. This is our opportunity to remedy that.

@dkolepp feel free to add your suggestions/thoughts :)

daobrien avatar Feb 05 '20 03:02 daobrien

Thanks David. So what I suggest is that we start with an initial review of the current list to get a feel for how many of those entries have a consistent layout VS those who have some variations.

I am not attached to any specific XML tagging (varlistentry VS a table), but @dkolepp mentioned something about the usage of a table that would be beneficial over a list.

rmahroua avatar Feb 06 '20 22:02 rmahroua

Hi Razique. I also talked to @dkolepp and we determined that we could test with one "character" - A - from the Usage Dictionary. If I refactor the XML to include the attribute discussed above then we could write a simple rule to test against a "dummy" course. i.e., pick a course (any course?), branch it, and run our Style Guide pcommit rules over it to see how well it works.

The entire Usage Dictionary consists of <variablelist> elements and I'd need a strong reason to rewrite that as table. If there is another way to use a table to only provide metadata, which doesn't appear in the output, I don't know what it is.

daobrien avatar Feb 07 '20 12:02 daobrien

What would be a suitable approach to terms that are ok in some circumstances? IBM classifies these as "cautionary." I can introduce "cautioin" to the list of attributes if that suits.

<varlistentry id="above">
  <term role="caution">above</term>
    <listitem>
	<para>
            Do not use to refer to information mentioned previously.
            When documents are converted to online format, the information may no longer be "above."
            Use a cross-reference if the referenced material is sufficiently removed, or write "as mentioned previously" instead.
	</para>
  </listitem>
</varlistentry>

daobrien avatar Feb 08 '20 06:02 daobrien

Are "caution" always accepted? In other words:

<varlistentry id="above">
  <term role="caution">above</term>

Do you have cautionary words that offer a (correct) alternative?

rmahroua avatar Feb 11 '20 01:02 rmahroua

It's hard to say. It's very much based on context. In some situations "above" might be perfectly acceptable, while in others we'd strongly recommend rephrasing. Similarly for "below"; we might suggest "the following" or a cross-reference, or something completely different depending on the example.

Instead of "caution" we might need an escape route, such as "ignore," and leave it to the humans to decide which form is correct.

daobrien avatar Feb 11 '20 05:02 daobrien

Good suggestion -- I vote in the implementation of an ignore flag for now, which will allow the editor to make a judgement call. That will also help up over time to refine the process and get a sense of how much human intervention is required.

Thoughts @dkolepp?

rmahroua avatar Feb 18 '20 18:02 rmahroua

@daobrien @rmahroua - no strong opinion or intuition at this point. Give something a try...

dkolepp avatar Feb 20 '20 22:02 dkolepp

So, definitive structure would be:

<varlistentry id="air-gap">
  <term role="ignore">air gap</term>
  <listitem>
    <para>
      <emphasis>n.</emphasis> Use "air gap" to describe systems that are separated, not by software, but physically. Do not use "air wall." "Air gap" is preferred in technical publications because there is no actual wall that you need to breach, but rather a gap that you need to bridge. You cannot break through something that does not exist.
    </para>
  </listitem>
</varlistentry>

In this case, role=ignore indicates that manual intervention from the editor -- if no intervention is required:

<varlistentry id="air-gap">
  <term role="true">air gap</term>
  <term role="false">air wall</term>
  <listitem>
    <para>
      <emphasis>n.</emphasis> Use "air gap" to describe systems that are separated, not by software, but physically. Do not use "air wall." "Air gap" is preferred in technical publications because there is no actual wall that you need to breach, but rather a gap that you need to bridge. You cannot break through something that does not exist.
    </para>
  </listitem>
</varlistentry>

Does that look good to you? If so, could you create a test file with that structure?

rmahroua avatar Feb 21 '20 00:02 rmahroua

@daobrien @rmahroua - no strong opinion or intuition at this point. Give something a try...

dkolepp avatar Feb 21 '20 14:02 dkolepp

It looks like this is going to go in a completely different direction based on what's possible with vale. @dkolepp and I are going to hook up late Sept to investigate further.

daobrien avatar Sep 11 '20 02:09 daobrien

This is on hold. Other tools (e.g. Vale) and resources (the internal IBM Style Guide) are hopefully just around the corner and will impact how this issue is addressed.

daobrien avatar Jan 31 '21 22:01 daobrien