OpenAPI-Specification icon indicating copy to clipboard operation
OpenAPI-Specification copied to clipboard

Markdown updates for 3.0.4

Open lornajane opened this issue 1 year ago • 3 comments

We need to handle the changes to anchor links and headings (see #3548 for context). We also have #3596 to add tooling, which I will work on after we've done this initial cleanup.

The commits tell their own story, but this is all very repeatable and hopefully transferrable between branches. Shout out to @handrews who got me started with the anchor/link rewriting script.

The process goes like this:

  1. Run prettier for formatting

    • prettier --write --single-quote 3.0.4.md
  2. Run markdownlint to fix whatever it can

    • markdownlint --fix 3.0.4.md
    • Configuration file .markdownlint.yaml:
            MD007:
              indent: 2
      
            MD012: false # allow blank lines
      
            MD013:
              line_length: 800
              tables: false
      
            MD024: false # duplicate headings
            MD033: false # inline HTML
      
  3. Manually fix additional markdownlint problems

    • heading levels aren't continuous MD001
    • code fences need a language MD040
    • table has the wrong number of cells MD056
  4. Take out the table of contents and its comments, replace with a single <!-- toc -->

  5. Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links. Basic idea:

    • make all our anchor links kebab-case
    • update all other internal document links
    • remove the ones that are in headings
    • add a lookup/override to fix things that either were inflected differently or we were using an anchor link that didn't match the title
    • (script is in comment)
    • use markdownlint again to check that this all worked because it can check internal links
  6. Put the TOC in with all the new structure and links using markdown-toc

    • markdown-toc --maxdepth 4 -i 3.0.4.md

It's 850 lines of change, I don't know how we're going to review it, but take a look!

lornajane avatar Jun 26 '24 20:06 lornajane

Super special python script:

from sys import argv
from pathlib import Path
import re

# this script tries to inflect the old links, some are just missing and we need to use the title's version instead
updates = {}
updates["revision-history"] = "appendix-a-revision-history"
updates["data-type-conversion"] = "appendix-b-data-type-conversion"
updates["using-r-f-c6570-implementations"] = "appendix-c-using-rfc6570-implementations"
updates["serializing-headers-and-cookies"] = "appendix-d-serializing-headers-and-cookies"
updates["percent-encoding-and-form-media-types"] = "appendix-e-percent-encoding-and-form-media-types"
updates["document-structure"] = "openapi-description-structure"
updates["oas-object"] = "openapi-object"
updates["components-security-schemes"] = "security-scheme-object"
updates["schema-composition"] = "composition-and-inheritance-polymorphism"
updates["http-codes"] = "http-status-codes"
updates["oas-document"] = "openapi-description"
updates["rich-text"] = "rich-text-formatting"
updates["relative-references"] = "relative-references-in-urls"
updates["runtime-expression"] = "runtime-expressions"
updates["runtime-expression-examples"] = "examples"



def kebab_it(c):
    if c.lower() != c: 
        return f'-{c.lower()}'
    return c

if __name__ == '__main__':
    text = Path(argv[1]).read_text()

    names = {}
    removals = {}
    for match in re.finditer(r'\n(.*)<a name="([^"]*)"', text):
        name = match.group(2)
        names[name] = ''.join([kebab_it(c) for c in name])

        # was it a heading? file it for removal
        if len(match.group(1)) and match.group(1)[0] == "#":
            removals[name] = True

    for current, replacement in names.items():
        if replacement in updates:
            replacement = updates[replacement]
        text = text.replace(f'(#{current})', f'(#{replacement})')

        # only remove if removal is indicated, otherwise update
        if current in removals:
            text = text.replace(f'<a name="{current}"></a>', '')
        else:
            text = text.replace(f'<a name="{current}"></a>', f'<a name="{replacement}"></a>')
            

    print(text)

Run it like: python kebab_it.py 3.0.4.md > temp.md and then if temp.md looks good, copy it back over 3.0.4.md

lornajane avatar Jun 26 '24 20:06 lornajane

Do we need to keep the TOC? GitHub creates one for us ...

image

mikekistler avatar Jun 27 '24 15:06 mikekistler

@mikekistler we've said the HTML rendering is authoritative now, so yes. (but it's a worthwhile question!)

handrews avatar Jun 27 '24 15:06 handrews

About the TOC, if I understood the thread correctly, respec builds a TOC as well, so we do not need one in our source at all

lornajane avatar Jun 30 '24 20:06 lornajane

I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update. Thanks @ralfhandl for getting into the respec details, I hadn't got there yet!

Changed items:

  • rebased to pick up newest additions to the dev branch
  • switched required bullet point format to * and make the toc command use this format as well
  • reapplied all other changes

It did occur to me that we could keep the <!-- toc --> notation in the source and just generate that bit when we're building the HTML version. It's the last commit on the branch here anyway, so super easy to remove.

lornajane avatar Jul 01 '24 20:07 lornajane

I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update.

Our HTML build script removes the table of contents from the Markdown, the ToC we see for example in https://spec.openapis.org/oas/latest.html is generated by ReSpec from the section headlines.

Here's the relevant part of our build script:

https://github.com/OAI/OpenAPI-Specification/blob/6bd37a356740b8128bfcca02b2d12d745def5135/scripts/md2html/md2html.js#L176-L178

Every line between lines starting with ## Table of Contents and <!-- /TOC is removed, including these lines.

With the current change from <!-- /TOC to <!-- tocstop --> everything except the first 18 lines would be removed unless we adjust the build script to the new ToC tool.

A better way forward would be to completely remove the "Table of Contents" subsection because

  • we don't need it for producing the published HTML
  • we don't need it for viewing the raw *.md files in GitHub because the GitHub Markdown viewer now has an Outline pane on the right with an auto-generated table of contents.

ralfhandl avatar Jul 02 '24 15:07 ralfhandl

Should also tackle #1720

lornajane avatar Jul 04 '24 16:07 lornajane

My notes on the process that @lornajane put in the PR description:

get checkout v3.0.4-dev git checkout -b fix-markdown cd versions

  1. Run prettier for formatting

npx prettier --write --single-quote 3.0.4.md

  1. Run markdownlint to fix whatever it can

Create markdown-lint.yaml with contents given, then

npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md

  1. Take out the table of contents and its comments, replace with a single

awk '/## Table of Contents/{f=1} //{f=0; print ""; next} {if (f==0) {print}} ' 3.0.4.md > temp.md; mv temp.md 3.0.4.md

  1. Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links.

python kebab_it.py 3.0.4.md > temp.md; mv temp.md 3.0.4.md npx markdownlint-cli --fix 3.0.4.md

  • use markdownlint again to check that this all worked because it can check internal links

npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md

mikekistler avatar Jul 25 '24 17:07 mikekistler

Updated to apply the changes to the newest 3.0.4 version, and marked as ready to review as I don't think we have any more changes in flight for this branch.

lornajane avatar Aug 01 '24 19:08 lornajane