specs icon indicating copy to clipboard operation
specs copied to clipboard

Sentence reformatter

Open jarrodmillman opened this issue 4 years ago • 3 comments

To make it easier to review PRs (we may want certain SPECs to be regularly updated), we should have a sentence formatter to ensure sentences are kept on separate lines from one another. It should be pre-commit hook and to the CI linter (.github/workflows/lint.yml). We will probably need to write our own tool.

Here are some sentence segmenter/reformatter links:

  • https://github.com/prettier/prettier/issues/3302
  • https://github.com/diasks2/pragmatic_segmenter
  • https://github.com/nipunsadvilkar/pySBD
  • https://pypi.org/project/pysbd/
  • https://spacy.io/usage
  • https://github.com/explosion/spaCy

I am not sure how best to do this. My initial thought was to combine a sentence segmenter with a pandoc filter (to work on the pandoc AST):

  • https://pandoc.org/filters.html
  • https://ulriklyngs.com/post/2019/02/20/how-to-use-pandoc-filters-for-advanced-customisation-of-your-r-markdown-documents/
  • https://www.reddit.com/r/vim/comments/6wk9be/how_to_reformat_indented_text_for_markdown/
  • http://scorreia.com/software/panflute/
  • https://pypi.org/project/pandocfilters/

I suspect it will be easiest to just go with one sentence per line

  • https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line
  • https://algorithmicallyrandom.blogspot.com/2014/03/one-sentence-per-line.html

When I write documents by myself, I tend to use semantic linefeeds (aka ventilated proses):

  • https://rhodesmill.org/brandon/2012/one-sentence-per-line/
  • https://vanemden.wordpress.com/2009/01/01/ventilated-prose/

But I suspect doing that is easier by hand (so the linefeeds happen in a reasonable place). We could just wrap sentences individually. But then we may need to worry about indentation. For example, I believe

1. Support for a given version of Python be dropped **3 years** after its initial release.
2. Support for a given version of other core packages be dropped **2 years** after their initial release.

should be formatted like

1. Support for a given version of Python be dropped **3 years** after
   its initial release.
2. Support for a given version of other core packages be dropped
   **2 years** after their initial release.

And I am sure the examples can be much more complicated. Pandoc may just handle this for us. If not, I suspect using one sentence per line may get complicated. But maybe it is already as complicated as it gets. For example,

1. Support for a given version of Python be dropped **3 years** after
   its initial release. Support for a given version of other core packages
   be dropped **2 years** after their initial release.

would need to be reformatted to

1. Support for a given version of Python be dropped **3 years** after its initial release.
   Support for a given version of other core packages be dropped **2 years** after their initial release.

I tend to write long-sentences, so the one sentence per line can get annoying if your editor doesn't wrap text for you. However, maybe we should encourage folks to avoid long sentences. I would like the SPECs to be clear and easy to understand. Anything that encourages shorter sentences may help.

Additional links I am not sure where to put:

  • https://github.com/lervag/vimtex/issues/1416
  • https://abizjak.github.io/emacs/2016/03/06/latex-fill-paragraph.html

jarrodmillman avatar Feb 09 '21 19:02 jarrodmillman

https://atom.io/packages/flowmark

jarrodmillman avatar Feb 10 '21 00:02 jarrodmillman

https://atom.io/packages/flowmark

Looks like that project is gone :/

stefanv avatar Sep 20 '24 16:09 stefanv

https://github.com/nipunsadvilkar/pySBD

stefanv avatar Sep 20 '24 17:09 stefanv