proselint
proselint copied to clipboard
[feature-proposal] streamline module-structure
while modernizing the codebase I had some thoughts on the check-structure I'd like to implement:
- get rid of the misc-category and move these checks into root-level to keep it flatter, example:
misc.numbers.check_xyzwould becomenumbers.check_xyz - every FN in check-script-file should be interpreted as a check, so checks can be adressed by
numbers.xyzand notnumbers.check_xyz. this makes the structure cleaner and individual selection can be granular on check-level - in addition to 2): generalize check-name with an exception, if there is a
disabled_,beta_or betterpreview_in front (or appendix) of it to mark faulty- or beta-checks (just use one of them and allow enabling that preview-class via cli-argument) - num 2) & 3) would also allow to remove boilerplate
err = "name"and autogenate it - maybe add a shortname instead, like PNS001 to allow disabling it in config
- establish a rule-system for when to use files and directories, ie.
oxymorons-dir has only one file with one check in it and should be a file. directories should only be used when there are real subcategories, which is a distinction to sub-checks in a file. - divide checks into static ones that can have precompiled regex (faster startup) and dynamic ones (url-checker, ...) that can't be pre-compiled
update: for making it future-safe:
- add language-metadata to allow other checks for specific languages or even general ones
- add keywords-metadata for selecting groups of checks like general, prose, nsfw, scientific,
I like where this is going so far. Here are my additional proposals for the check structure
Out of scope considerations: cursing, security, links, spelling. Each has merits, but they feel out of place in the context of the goal of proselint.
Condensing considerations:
-
lgbtq,sexismand similar can be combined. This category might be named something likesocial_awareness -
airlinese,corporate_speakand similar can be combined. This category might be named something likeindustrial_language. Optionally,jargonmay be added to this. -
mondegreensandmalapropismsare commonly grouped together in analysis. - Punctuation spacing and
typographyshould be together, too. Spacing is a purely typographic matter.
I also liked your previous suggestion of flattening categories with only one file. It might be worth making a standard for check naming, while we're at it.
Let me know what you think.