proselint icon indicating copy to clipboard operation
proselint copied to clipboard

[feature-proposal] streamline module-structure

Open orgua opened this issue 2 years ago • 1 comments

while modernizing the codebase I had some thoughts on the check-structure I'd like to implement:

  1. get rid of the misc-category and move these checks into root-level to keep it flatter, example: misc.numbers.check_xyz would become numbers.check_xyz
  2. every FN in check-script-file should be interpreted as a check, so checks can be adressed by numbers.xyz and not numbers.check_xyz. this makes the structure cleaner and individual selection can be granular on check-level
  3. in addition to 2): generalize check-name with an exception, if there is a disabled_, beta_ or better preview_ in front (or appendix) of it to mark faulty- or beta-checks (just use one of them and allow enabling that preview-class via cli-argument)
  4. num 2) & 3) would also allow to remove boilerplate err = "name" and autogenate it
  5. maybe add a shortname instead, like PNS001 to allow disabling it in config
  6. establish a rule-system for when to use files and directories, ie. oxymorons-dir has only one file with one check in it and should be a file. directories should only be used when there are real subcategories, which is a distinction to sub-checks in a file.
  7. divide checks into static ones that can have precompiled regex (faster startup) and dynamic ones (url-checker, ...) that can't be pre-compiled

update: for making it future-safe:

  1. add language-metadata to allow other checks for specific languages or even general ones
  2. add keywords-metadata for selecting groups of checks like general, prose, nsfw, scientific,

orgua avatar Jan 22 '24 13:01 orgua

I like where this is going so far. Here are my additional proposals for the check structure

Out of scope considerations: cursing, security, links, spelling. Each has merits, but they feel out of place in the context of the goal of proselint.

Condensing considerations:

  • lgbtq, sexism and similar can be combined. This category might be named something like social_awareness
  • airlinese, corporate_speak and similar can be combined. This category might be named something like industrial_language. Optionally, jargon may be added to this.
  • mondegreens and malapropisms are commonly grouped together in analysis.
  • Punctuation spacing and typography should be together, too. Spacing is a purely typographic matter.

I also liked your previous suggestion of flattening categories with only one file. It might be worth making a standard for check naming, while we're at it.

Let me know what you think.

Nytelife26 avatar Feb 04 '24 13:02 Nytelife26