go-away icon indicating copy to clipboard operation
go-away copied to clipboard

False positives: "can use", "via ssh"

Open quackduck opened this issue 3 years ago • 7 comments

Of course, I could add these to the false positives list, but maybe there's a better, more general way to tackle these.

quackduck avatar Apr 17 '22 21:04 quackduck

Yeah, adding canuse and viassh to the default list of false positives is probably going to be the easiest way to tackle this.

TwiN avatar Apr 17 '22 23:04 TwiN

True. My issue was more about whether there could be a way to detect these innocent legitimate two word messages.

quackduck avatar Apr 21 '22 04:04 quackduck

Yeah there isn't really one besides using the false positives list.

You could create a PR to add them to the default false positives if you'd like: https://github.com/TwiN/go-away/blob/b5570dbc7793ae5f946af94fb431c4a73b64e0b3/falsepositives.go#L4

TwiN avatar Apr 22 '22 23:04 TwiN

Yeah there isn't really one besides using the false positives list.

It would take some work on your end, but you could process my comprehensive false positives list in a code generator, as follows:

  • Read file line by line
  • Feed each line into goaway
  • If it detects something, add it as a false positive (or tell me if something bad ended up in the list :wink:)

If you're wondering, I generated it using a dictionary search of words and pairs of words, combined with my own additions.

The downside is that my filter operates a bit differently (has some interesting heuristics), and doesn't require certain false positives to be explicitly included in its list. In these cases, you would still need to maintain your own false positive list and/or replicate the dictionary search.

finnbear avatar May 10 '22 06:05 finnbear

Thanks for commenting! @TwiN this could also be a good place to use go:embed (then decode on init() possibly)

(I’m curious: how did you find this thread @finnbear?)

quackduck avatar May 10 '22 07:05 quackduck

this could also be a good place to use go:embed (then decode on init() possibly)

True! The downside here is that you would be including the entire list, when only a subset is relevant to goaway. A build step/code generator is more work, but could avoid wasting space in the compiled binary by filtering in advance.

(I’m curious: how did you find this thread @finnbear?)

I check in on this repository every once in a while, as it was and is a great source of inspiration for my profanity filters :smiley:

finnbear avatar May 10 '22 07:05 finnbear

True! The downside here is that you would be including the entire list, when only a subset is relevant to goaway.

We could trim the file once as needed

quackduck avatar May 10 '22 13:05 quackduck