proselint icon indicating copy to clipboard operation
proselint copied to clipboard

modernization of proselint

Open orgua opened this issue 5 months ago • 31 comments

Hello,

I put some days into improving the codebase. A lot of open issues should be solved by this. BUT changes are far more substantial as planned - so I wanted to ask how to proceed.

This repo is not very active in the last years. I would offer to maintain the project for the time being and also be open for guiding input. Some of the open Pull Request would be beneficial to the project. I also got plans for new features, improving ease-of-use, more tests for correctness and other optimizations.

I'm open for a chat. Mail is in setup.cfg of one of my projects.

If there are other plans for the project I would probably fork and start fresh - this is not my preferred solution.

Biggest Changes

  • featureset of py38
  • cleaner and faster code
  • fixed bugs, undefined behavior, broken legacy code
  • some checks did not work as intended
  • multiprocessed parallelization and other optimizations

Benchmark-Comparison

  • custom set of files, same hardware

Windows

Proselint-main (uncached & cached)

  • Found 104 lint-warnings in 47.579 s
  • Found 104 lint-warnings in 0.878 s

Proselint-modernized

  • Found 108 lint-warnings in 39.930 s (12 files, 617.45 kiByte) -> serialized
  • Found 108 lint-warnings in 13.164 s (12 files, 617.45 kiByte) -> serial files, parallel checks
  • Found 108 lint-warnings in 9.931 s (12 files, 617.45 kiByte) -> global check-executor
  • Found 108 lint-warnings in 0.011 s (12 files, 617.45 kiByte) -> cached

Linux / WSL

Proselint-main (uncached & cached)

  • Found 104 lint-warnings in 37.041 s
  • Found 104 lint-warnings in 0.771 s

Proselint-modernized

  • Found 108 lint-warnings in 34.521 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 8.248 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 6.098 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 0.044 s (12 files, 617.45 kiByte)

Breaking changes

  • cli arg --time -> --benchmark, -b
  • cli arg --debug -> --verbose, -v
  • exit code now returns number of errors
  • cli arg --output-format controls old flags json, compact
  • py >= 3.8
  • api-changes -> web_scripts & plugins untested
  • config adjustments (1 test removed, 1 added)
    • "misc.metaconcepts": False, # TODO: remove, was duplicate of scare_quotes

Whats broken / untested

  • web_scripts
  • plugins
  • Github action

Detailed Changelog

  • featureset of py38
  • use pathlib instead of string based paths
  • add type-hinting
  • replaced memoizer, short-comings:
    • shelves have a good interface but are slow
    • cache-init and -closing was a mess
    • clearing cache resulted in broken states
    • memoize-wrapper was wasting resources (hash of text recalc for every check, ..)
    • age of cache-entries was not considered
    • every cached fn had its own file-cache
    • cache migration was done on every memoize-decoration
  • cache
    • hash test beforehand only once
    • consider age of items
    • memoize lint() instead of checks
  • fix bugs related to:
    • overshadowing builtin names (list, ...)
    • string based path-traversal
    • mix of relative and absolute paths (also relics that deleted file somewhere)
    • varius small bugs (ie. missing comma in element lists)
    • cache was not cleaned up by fn (as it should)
  • lots of modernization from old codebase
    • latest python versions had trouble with proselint
  • update deps
  • reformat using black-style, done by ruff
  • linted codebase with ruff
  • add logger and refactor print-output
    • linter informs about duration
    • results include link to file
    • inform about duration, files scanned and text-size
  • refactor unittest
    • remove classes / boilerplate
    • move tests for checks into separate dir
    • more helpful error-messages
    • remove duplicates
    • repair broken tests
    • lower complexity of tests (each test should only test for one thing)
  • add unittests
    • detect missing check-flags in default-config
    • find unavaible checks for default-config-flags
    • don't fail on wrong config-flags
    • check working cache, speed-test
    • check working cache, same results
  • minimal changes to outer api
  • config
    • correctness of default-config is tested
    • don't fail on faulty config-flags
  • checks
    • remove duplicates
    • fix padding for checks using regex
  • existence_check() - join and padding -> cleared up
    • join is always done
    • padding can be selected py Enum, defaults to separate in text
    • fix checks accordingly
  • put root-scripts into sep dir
  • exithandler for more graceful ctrl+c
  • ppm-wrapper is now more forgiving with small sample-sizes
  • add parallelization
    • checks() are run multiprocessed
    • one global executioner collects tasks for all files to analyze
    • break up the slowest checks to balance the load
  • optimizations were done with profiling and benchmarking
  • overhaul controling output-format by cli

orgua avatar Jan 05 '24 23:01 orgua