smithy icon indicating copy to clipboard operation
smithy copied to clipboard

Add spell check linter

Open mtdowling opened this issue 3 years ago • 2 comments

It would be nice to have a spell check model linter that looks for spelling issues in shape names, member names, documentation, and maybe any string. It might be hard to do any string, but it would be interesting to see if it's possible and/or worthwhile (i.e., it could lead to severe performance issues and too many false-positives). This linter should have a default dictionary that can be appended to using a custom newline separated string that contains words. Custom words are a hard requirement since most models use domain specific terminology that isn't feasible to capture in the default list of words. The spell checker doesn't necessarily need to offer spelling suggestions, so that likely makes it easier to implement. Sentences would need to be broken down into individual words by tokenizing strings based on things like " ", "-", ",", ".", ";", ":", "_", etc.

The best dictionary I know of is https://github.com/dwyl/english-words, though the license is unclear, and we'll need to filter out bad words. The dictionary is around 4 MB, so we'll need to make sure we don't have to load the file repeatedly or store it in memory multiple times.

mtdowling avatar Dec 10 '20 20:12 mtdowling

we'll need to filter out bad words

This is remarkably harder than you'd think. I've yet to find a list that's able to filter out everything just in the listed repo, and even applying stemming techniques only gets you so far.

JordonPhillips avatar Dec 28 '20 13:12 JordonPhillips

Ignoring regular expression patterns in addition to dictionaries of specific words is critical

PatMyron avatar May 08 '21 03:05 PatMyron