Pkg.jl
Pkg.jl copied to clipboard
Use levsort for `add` suggestions
As @ericphanson pointed out, levsort seems to work better
data:image/s3,"s3://crabby-images/8dc91/8dc91e8e374cd0b9c1cda1b4740db18fba2b29b9" alt="Screen Shot 2022-02-20 at 6 57 34 PM"
However, some of these are quite distant, especially for short names.
@ericphanson any idea of a better threshold to use here?
We could take top 3 results maybe? But that’s not ideal- if we have say a 3 or 4 letter package, there might be many packages one substitution away, so we are depending on fuzzy score for tie-breaking to decide the top 3 results. But fuzzy score seems to roughly measure how close the start of the names matches- not sure how useful that is.
In those examples though, at least the first result seems good, so maybe just limiting the # of results will help.
What would be better would be if we had a weighted Levenshtein distance that used empirical typo frequencies on say qwerty keyboards to determine the weights. Then we wouldn’t have so many ties, since the weights would give us a better ranking. (I’d like this for RegistryCI too!). But probably that’s out of scope for this…
It could be a pkgserver query. Send a name, get back a shortlist of suggestions. Then we could throw any/all algorithms at it. Package popularity, seo ranking etc etc
I think that was @StefanKarpinski's suggestion somewhere. Though it's a fair bit of work, and wouldn't work for other registries or local packages
I don't think we have a weighted Levenshtein (or even better, Damerau-Levenshtein) in Julia, so we'd still need to write that if we wanted that algorithm, but at least it could be a separate package, true. Seems a lot simpler to not need to make a network call though.
Package popularity, seo ranking etc etc
That does sound cool... I think at this point though, we don't have so many packages / name clashes that we need that kind of indicator, and hopefully with automerge's nudge towards name dissimilarity, it can stay that way for awhile. I could see how this could be necessary at some point though, especially if typo squatting started to happen... but ideally that can be handled better at the registry maintenance level.
wouldn't work for other registries or local packages
I suppose you could setup a private registry with a private package server, and implement the API endpoint for name similarity, and that could work. But as someone who uses a private registry without a package server, I'd still prefer it to work without a package server set up :).
Cross-referencing #1834, for historical record.