anitya icon indicating copy to clipboard operation
anitya copied to clipboard

Data quality: canonical homepage URLs

Open dfandrich opened this issue 5 months ago • 1 comments

Many projects have homepage URLs that are out-of-date or otherwise noncanonical. These should be fixed in bulk to improve the quality of data in Anitya and make it easier for consumers to use. I'm not aware of an efficient API that allows these kinds of entries to be found, so it would likely need to be fixed directly in the DB.

  • rubygems.org URLs with specific version numbers, e.g. kojo has https://rubygems.org/gems/kojo/versions/0.3.2 when it should be https://rubygems.org/gems/kojo/
  • pypi.org URLs that include specific version numbers. e.g. pyinstallerui has https://pypi.org/project/pyinstallerui/0.0.1 when it should be https://pypi.org/project/pyinstallerui/
  • pypi.python.org URLs: These should be on the pypi.org domain, which is where they currently redirect (and have done so since 2018). e.g. landslide
  • search.cpan.org URLs: These currently redirect to https://metacpan.org/ (and have since 2018) e.g. FCGI
  • metacpan.org/release/X URLs: these redirect to metacpan.org/dist/X e.g. Mail-Alias
  • Sourceforge has a number of aliases that forward to a canonical project home page, e.g. sf.net/p/X sf/projects/X sourceforge.net/p/X

dfandrich avatar Jul 18 '25 20:07 dfandrich

I agree, just don't have time to work on that currently.

Zlopez avatar Jul 21 '25 09:07 Zlopez