Sentence reformatter
To make it easier to review PRs (we may want certain SPECs to be regularly updated), we should have a sentence formatter to ensure sentences are kept on separate lines from one another. It should be pre-commit hook and to the CI linter (.github/workflows/lint.yml). We will probably need to write our own tool.
Here are some sentence segmenter/reformatter links:
- https://github.com/prettier/prettier/issues/3302
- https://github.com/diasks2/pragmatic_segmenter
- https://github.com/nipunsadvilkar/pySBD
- https://pypi.org/project/pysbd/
- https://spacy.io/usage
- https://github.com/explosion/spaCy
I am not sure how best to do this. My initial thought was to combine a sentence segmenter with a pandoc filter (to work on the pandoc AST):
- https://pandoc.org/filters.html
- https://ulriklyngs.com/post/2019/02/20/how-to-use-pandoc-filters-for-advanced-customisation-of-your-r-markdown-documents/
- https://www.reddit.com/r/vim/comments/6wk9be/how_to_reformat_indented_text_for_markdown/
- http://scorreia.com/software/panflute/
- https://pypi.org/project/pandocfilters/
I suspect it will be easiest to just go with one sentence per line
- https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line
- https://algorithmicallyrandom.blogspot.com/2014/03/one-sentence-per-line.html
When I write documents by myself, I tend to use semantic linefeeds (aka ventilated proses):
- https://rhodesmill.org/brandon/2012/one-sentence-per-line/
- https://vanemden.wordpress.com/2009/01/01/ventilated-prose/
But I suspect doing that is easier by hand (so the linefeeds happen in a reasonable place). We could just wrap sentences individually. But then we may need to worry about indentation. For example, I believe
1. Support for a given version of Python be dropped **3 years** after its initial release.
2. Support for a given version of other core packages be dropped **2 years** after their initial release.
should be formatted like
1. Support for a given version of Python be dropped **3 years** after
its initial release.
2. Support for a given version of other core packages be dropped
**2 years** after their initial release.
And I am sure the examples can be much more complicated. Pandoc may just handle this for us. If not, I suspect using one sentence per line may get complicated. But maybe it is already as complicated as it gets. For example,
1. Support for a given version of Python be dropped **3 years** after
its initial release. Support for a given version of other core packages
be dropped **2 years** after their initial release.
would need to be reformatted to
1. Support for a given version of Python be dropped **3 years** after its initial release.
Support for a given version of other core packages be dropped **2 years** after their initial release.
I tend to write long-sentences, so the one sentence per line can get annoying if your editor doesn't wrap text for you. However, maybe we should encourage folks to avoid long sentences. I would like the SPECs to be clear and easy to understand. Anything that encourages shorter sentences may help.
Additional links I am not sure where to put:
- https://github.com/lervag/vimtex/issues/1416
- https://abizjak.github.io/emacs/2016/03/06/latex-fill-paragraph.html
https://atom.io/packages/flowmark
https://atom.io/packages/flowmark
Looks like that project is gone :/
https://github.com/nipunsadvilkar/pySBD