opam-repo-ci
opam-repo-ci copied to clipboard
Add typosquatting lint check
As per https://github.com/ocurrent/opam-repo-ci/issues/378, we removed the part of name collision detection that used a Levenshtein distance. We found this was not a helpful metric, and it only gave false positives.
@punchagan did some research and found focused work to specifically detect typosquatting problems. We think this would be a helpful replacement for the removed Levenshtein distance check.
From https://github.com/ocurrent/opam-repo-ci/issues/378#issuecomment-2392990004
There's some prior work done on other package archives (like PyPI, npm and Rust's crates) in this [paper=(https://arxiv.org/pdf/2003.03471), and the packages based on / related to it: typogard and typomania.
The paper (and the packages) primarily focus on malicious typo-squatting, and the package repositories are much larger than opam. But, we could adapt the Typosquatting Signals (Sec 3.3) explored in the paper for our use case 1 2. They use a concept of popular (and unpopular) packages for detecting malicious typosquatting, but we probably don't need that for our use case given we aren't doing strictly for malicious typosquatting checks, our repository size and the manual approval process for package addition/updates.