Markdown updates for 3.0.4
We need to handle the changes to anchor links and headings (see #3548 for context). We also have #3596 to add tooling, which I will work on after we've done this initial cleanup.
The commits tell their own story, but this is all very repeatable and hopefully transferrable between branches. Shout out to @handrews who got me started with the anchor/link rewriting script.
The process goes like this:
-
Run prettier for formatting
-
prettier --write --single-quote 3.0.4.md
-
-
Run markdownlint to fix whatever it can
-
markdownlint --fix 3.0.4.md - Configuration file
.markdownlint.yaml:MD007: indent: 2 MD012: false # allow blank lines MD013: line_length: 800 tables: false MD024: false # duplicate headings MD033: false # inline HTML
-
-
Manually fix additional markdownlint problems
- heading levels aren't continuous MD001
- code fences need a language MD040
- table has the wrong number of cells MD056
-
Take out the table of contents and its comments, replace with a single
<!-- toc --> -
Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links. Basic idea:
- make all our anchor links kebab-case
- update all other internal document links
- remove the ones that are in headings
- add a lookup/override to fix things that either were inflected differently or we were using an anchor link that didn't match the title
- (script is in comment)
- use markdownlint again to check that this all worked because it can check internal links
-
Put the TOC in with all the new structure and links using markdown-toc
-
markdown-toc --maxdepth 4 -i 3.0.4.md
-
It's 850 lines of change, I don't know how we're going to review it, but take a look!
Super special python script:
from sys import argv
from pathlib import Path
import re
# this script tries to inflect the old links, some are just missing and we need to use the title's version instead
updates = {}
updates["revision-history"] = "appendix-a-revision-history"
updates["data-type-conversion"] = "appendix-b-data-type-conversion"
updates["using-r-f-c6570-implementations"] = "appendix-c-using-rfc6570-implementations"
updates["serializing-headers-and-cookies"] = "appendix-d-serializing-headers-and-cookies"
updates["percent-encoding-and-form-media-types"] = "appendix-e-percent-encoding-and-form-media-types"
updates["document-structure"] = "openapi-description-structure"
updates["oas-object"] = "openapi-object"
updates["components-security-schemes"] = "security-scheme-object"
updates["schema-composition"] = "composition-and-inheritance-polymorphism"
updates["http-codes"] = "http-status-codes"
updates["oas-document"] = "openapi-description"
updates["rich-text"] = "rich-text-formatting"
updates["relative-references"] = "relative-references-in-urls"
updates["runtime-expression"] = "runtime-expressions"
updates["runtime-expression-examples"] = "examples"
def kebab_it(c):
if c.lower() != c:
return f'-{c.lower()}'
return c
if __name__ == '__main__':
text = Path(argv[1]).read_text()
names = {}
removals = {}
for match in re.finditer(r'\n(.*)<a name="([^"]*)"', text):
name = match.group(2)
names[name] = ''.join([kebab_it(c) for c in name])
# was it a heading? file it for removal
if len(match.group(1)) and match.group(1)[0] == "#":
removals[name] = True
for current, replacement in names.items():
if replacement in updates:
replacement = updates[replacement]
text = text.replace(f'(#{current})', f'(#{replacement})')
# only remove if removal is indicated, otherwise update
if current in removals:
text = text.replace(f'<a name="{current}"></a>', '')
else:
text = text.replace(f'<a name="{current}"></a>', f'<a name="{replacement}"></a>')
print(text)
Run it like: python kebab_it.py 3.0.4.md > temp.md and then if temp.md looks good, copy it back over 3.0.4.md
Do we need to keep the TOC? GitHub creates one for us ...
@mikekistler we've said the HTML rendering is authoritative now, so yes. (but it's a worthwhile question!)
About the TOC, if I understood the thread correctly, respec builds a TOC as well, so we do not need one in our source at all
I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update. Thanks @ralfhandl for getting into the respec details, I hadn't got there yet!
Changed items:
- rebased to pick up newest additions to the dev branch
- switched required bullet point format to
*and make the toc command use this format as well - reapplied all other changes
It did occur to me that we could keep the <!-- toc --> notation in the source and just generate that bit when we're building the HTML version. It's the last commit on the branch here anyway, so super easy to remove.
I'm not sure respec does the table of contents in a way we can use, so I've regenerated it in this update.
Our HTML build script removes the table of contents from the Markdown, the ToC we see for example in https://spec.openapis.org/oas/latest.html is generated by ReSpec from the section headlines.
Here's the relevant part of our build script:
https://github.com/OAI/OpenAPI-Specification/blob/6bd37a356740b8128bfcca02b2d12d745def5135/scripts/md2html/md2html.js#L176-L178
Every line between lines starting with ## Table of Contents and <!-- /TOC is removed, including these lines.
With the current change from <!-- /TOC to <!-- tocstop --> everything except the first 18 lines would be removed unless we adjust the build script to the new ToC tool.
A better way forward would be to completely remove the "Table of Contents" subsection because
- we don't need it for producing the published HTML
- we don't need it for viewing the raw
*.mdfiles in GitHub because the GitHub Markdown viewer now has an Outline pane on the right with an auto-generated table of contents.
Should also tackle #1720
My notes on the process that @lornajane put in the PR description:
get checkout v3.0.4-dev git checkout -b fix-markdown cd versions
- Run prettier for formatting
npx prettier --write --single-quote 3.0.4.md
- Run markdownlint to fix whatever it can
Create markdown-lint.yaml with contents given, then
npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md
- Take out the table of contents and its comments, replace with a single
awk '/## Table of Contents/{f=1} //{f=0; print ""; next} {if (f==0) {print}} ' 3.0.4.md > temp.md; mv temp.md 3.0.4.md
- Run a magical one-off script to update/fix/rewrite/remove all our anchors and internal links.
python kebab_it.py 3.0.4.md > temp.md; mv temp.md 3.0.4.md npx markdownlint-cli --fix 3.0.4.md
- use markdownlint again to check that this all worked because it can check internal links
npx markdownlint-cli --config markdown-lint.yaml --fix 3.0.4.md
Updated to apply the changes to the newest 3.0.4 version, and marked as ready to review as I don't think we have any more changes in flight for this branch.