pypi-tools icon indicating copy to clipboard operation
pypi-tools copied to clipboard

Investigation into "canonical" link for a PyPI repo link

Open hugovk opened this issue 6 years ago • 10 comments

Summary: use Source


In addition to url (alias homepage), packages on PyPI can have this metadata:

project_urls

An arbitrary map of URL names to hyperlinks, allowing more extensible documentation of where various resources can be found than the simple url and download_url options provide.

The url homepage is added into project_urls as homepage. For example, Pillow doesn't use define any project_urls but does have url="http://python-pillow.org",, and https://pypi.org/pypi/Pillow/json includes:

"home_page": "http://python-pillow.org",
"project_urls": {
    "Homepage": "http://python-pillow.org"
},

Many projects have a link to their GitHub (or GitLab or Bitbucket etc.) repos as the homepage. For those that include an arbitrary link to a source repo, what is the most common one, when not the homepage?

Checking the current top 5,000 packages, here is the project_url key where a source repo was found (defined as a URL containing one of github.com, gitlab.com, bitbucket.org or bitbucket.com):

Counter({'Homepage': 3711,
         None: 1047,
         'Source': 95,
         'Download': 63,
         'Source Code': 38,
         'Code': 14,
         'Issue Tracker': 5,
         'Repository': 5,
         'GitHub: issues': 4,
         'Github': 3,
         'Bug Tracker': 3,
         'Bug Reports': 2,
         'Issue tracker': 2,
         'Source code': 2,
         'Twine source': 1,
         'Issues': 1,
         'Github repo': 1,
         'Change log': 1,
         'Changelog': 1,
         'GitHub': 1})

Some of these are specific things, like links to tarball downloads, or issue trackers. But the most common ones for a repo homepage are Source, Source Code and Code.

  • I'll use Source for adding new ones.

hugovk avatar Nov 03 '19 19:11 hugovk

SCM links in projects_urls, preview:

Load data/top-repos.json...
Load data/top-pypi-packages.json...
Already done: 0
Find new repos...
Homepage	https://github.com/benjaminp/six
Homepage	https://github.com/boto/botocore
Homepage	https://github.com/boto/s3transfer
Homepage	https://github.com/kjd/idna
Homepage	https://github.com/chardet/chardet
Homepage	https://github.com/etingof/pyasn1
Homepage	https://github.com/yaml/pyyaml
Homepage	https://github.com/jmespath/jmespath.py
Homepage	https://github.com/pypa/setuptools
Homepage	https://github.com/agronholm/pythonfutures
Homepage	https://github.com/tartley/colorama
Homepage	https://github.com/boto/boto3
Homepage	https://github.com/simplejson/simplejson
Source Code	https://github.com/numpy/numpy
Homepage	https://github.com/pypa/wheel
Download	https://github.com/protocolbuffers/protobuf/releases
...
Homepage	https://github.com/broadinstitute/keras-resnet
Homepage	https://github.com/CyberZHG/keras-position-wise-feed-forward
Homepage	https://github.com/makinacorpus/django-admin-watchdog
Old repos: 0
New repos: 3953
Not found: 1047
Counter({'Homepage': 3711,
         None: 1047,
         'Source': 95,
         'Download': 63,
         'Source Code': 38,
         'Code': 14,
         'Issue Tracker': 5,
         'Repository': 5,
         'GitHub: issues': 4,
         'Github': 3,
         'Bug Tracker': 3,
         'Bug Reports': 2,
         'Issue tracker': 2,
         'Source code': 2,
         'Twine source': 1,
         'Issues': 1,
         'Github repo': 1,
         'Change log': 1,
         'Changelog': 1,
         'GitHub': 1})

Full list:

hugovk avatar Nov 03 '19 19:11 hugovk

The project_urls for each of the top 5,000, preview:

{'Homepage': 'https://urllib3.readthedocs.io/'}
{'Homepage': 'https://github.com/benjaminp/six'}
{'Homepage': 'https://github.com/boto/botocore'}
{'Homepage': 'http://python-requests.org'}
{'Homepage': 'https://dateutil.readthedocs.io'}
{'Homepage': 'https://pip.pypa.io/'}
{'Homepage': 'https://github.com/boto/s3transfer'}
{'Homepage': 'https://certifi.io/'}
{'Homepage': 'https://github.com/kjd/idna'}
{'Homepage': 'http://docutils.sourceforge.net/'}
{'Homepage': 'https://github.com/chardet/chardet'}
{'Homepage': 'https://github.com/etingof/pyasn1'}
{'Download': 'https://pypi.org/project/PyYAML/', 'Homepage': 'https://github.com/yaml/pyyaml'}
{'Homepage': 'https://stuvel.eu/rsa'}
{'Homepage': 'https://github.com/jmespath/jmespath.py'}
{'Documentation': 'https://setuptools.readthedocs.io/', 'Homepage': 'https://github.com/pypa/setuptools'}
{'Download': 'https://pypi.org/project/pytz/', 'Homepage': 'http://pythonhosted.org/pytz'}
{'Homepage': 'https://github.com/agronholm/pythonfutures'}
{'Homepage': 'https://github.com/tartley/colorama'}
{'Homepage': 'http://aws.amazon.com/cli/'}
...

Full list:

hugovk avatar Nov 03 '19 19:11 hugovk

Multipart zip of /Users/hugo/Library/Caches/source-finder/ containing the top 5,000 (plus 5) JSON metatdata, created with zip source-finder.zip --out cachefiles.zip -s 10m

Rename the .z0X.zip to .zOX before uncompressing.

hugovk avatar Nov 03 '19 20:11 hugovk

And a count of all the project_urls keys:

Counter({'Homepage': 4881,
         'Download': 1164,
         'Documentation': 238,
         'Issue tracker': 116,
         'Source': 95,
         'Tracker': 41,
         'Source Code': 38,
         'Bug Tracker': 36,
         'Repository': 31,
         'Changelog': 28,
         'Bug Reports': 25,
         'Funding': 18,
         'Issues': 15,
         'Issue Tracker': 14,
         'Code': 14,
         'CI: Travis': 9,
         'GitHub: issues': 7,
         'GitHub: repo': 7,
         'Source code': 7,
         'CI: AppVeyor': 5,
         'Docs: RTD': 5,
         'Docs': 4,
         'CI: Circle': 4,
         'Donation': 4,
         'GitHub': 4,
         'Chat: Gitter': 3,
         'Coverage: codecov': 3,
         'Tidelift': 3,
         'Github': 3,
         'Travis CI': 3,
         'Say Thanks!': 3,
         'CI: Shippable': 2,
         'Website': 2,
         'Code of Conduct': 2,
         'Mailing lists': 2,
         'Change log': 2,
         'Release Management': 2,
         'Webpage': 2,
         'CI': 2,
         'PyPI': 1,
         'Test Coverage': 1,
         'Tests': 1,
         'Packaging tutorial': 1,
         'Twine documentation': 1,
         'Twine source': 1,
         'CI: CircleCI': 1,
         'Support': 1,
         'Benchmarks': 1,
         'Wiki': 1,
         'Github repo': 1,
         'Wikipedia': 1,
         'Blog': 1,
         'Donate': 1,
         'Tidelift Subscription': 1,
         'Dev Docs': 1,
         'Discord': 1,
         'Forum': 1,
         'Code Coverage': 1,
         'Continuous Integration': 1,
         'Mailing List': 1,
         'Chat': 1,
         'Community': 1,
         'Gitter': 1,
         'bugs': 1,
         'repository': 1,
         'Issue Tracking': 1,
         'Discord server': 1})

hugovk avatar Nov 03 '19 20:11 hugovk

@hugovk, I think https://github.com/jayvdb/pypidb will be helpful. Note the repos are still getting set up, and there is currently a dependency on https://github.com/jayvdb/https-everywhere-py master, which I will fix by getting a new release out within a day or two.

jayvdb avatar Mar 20 '20 09:03 jayvdb

Looks good! Thanks!

hugovk avatar Mar 20 '20 09:03 hugovk

August 2020

Updated list of most popular project_uls keys in the top 4,000 downloaded packages (via https://github.com/hugovk/pypi-tools/pull/20#issue-493725680):

$ python3 project_urls.py -n 4000
Load data/top-pypi-packages.json...
Find project_urls...
100%|████████████████████████████████| 4000/4000 [00:07<00:00, 524.71project/s]
Counter({'Homepage': 3916,
         'Download': 778,
         'Documentation': 240,
         'Source': 152,
         'Changelog': 70,
         'Repository': 63,
         'Bug Tracker': 62,
         'Source Code': 60,
         'Tracker': 55,
         'Issue tracker': 39,
         'Issue Tracker': 30,
         'GitHub': 28,
         'Code': 26,
         'Issues': 21,
         'Funding': 20,
         'Bug Reports': 17,
         'Bug-Tracker': 8,
         'Twitter': 8,
         'CI: Travis': 7,
         'Source-Code': 7,
         'Docs': 6,
         'GitHub: issues': 6,
         'GitHub: repo': 6,
         'Github': 6,
         'Source code': 6,
         'bugs': 6,
         'repository': 6,
         'Docs: RTD': 5,
         'Donation': 5,
         'CI: AppVeyor': 3,
         'CI: Circle': 3,
         'Chat: Gitter': 3,
         'Code of Conduct': 3,
         'Coverage: codecov': 3,
         'Donate': 3,
         'Mailing List': 3,
         'Say Thanks!': 3,
         'Tidelift': 3,
         'Travis CI': 3,
         'CI': 2,
         'CI: GitHub': 2,
         'CI: Shippable': 2,
         'Change log': 2,
         'Chat': 2,
         'Download RPMs': 2,
         'Forum': 2,
         'Mailing lists': 2,
         'Release Management': 2,
         'Release notes': 2,
         'Tidelift: funding': 2,
         'Website': 2,
         'Benchmarks': 1,
         'Blog': 1,
         'Bug tracker': 1,
         'Bugs': 1,
         'CI: Azure Pipelines': 1,
         'CI: CircleCI': 1,
         'CI: GitHub Workflows': 1,
         'CI: Zuul': 1,
         'Code Coverage': 1,
         'Commercial License': 1,
         'Community': 1,
         'Conda-Forge': 1,
         'Continuous Integration': 1,
         'Coverage': 1,
         'Dev Docs': 1,
         'Discord': 1,
         'Discussions': 1,
         'Downloads': 1,
         'Examples': 1,
         'Feedstock': 1,
         'Further Documentation': 1,
         'Github repo': 1,
         'Help/Questions': 1,
         'History': 1,
         'License': 1,
         'Online Demo': 1,
         'Packaging tutorial': 1,
         'PyPI': 1,
         'Read the Docs': 1,
         'Release Notes': 1,
         'Releases': 1,
         'Support': 1,
         'Test Coverage': 1,
         'Tests': 1,
         'Tutorials': 1,
         'Twine documentation': 1,
         'Twine source': 1,
         "What's New": 1,
         'Wiki': 1,
         'Wikipedia': 1,
         'conda': 1})

Number with project_urls: 3925/4000

hugovk avatar Sep 27 '20 14:09 hugovk

June 2022

Updated list of most popular project_uls keys in the top 5,000 downloaded packages:

python3 pypi_fields.py --number 5000 --format markdown

Top 10

project_urls Count
Homepage 4845
Download 738
Documentation 711
Source 400
Bug Tracker 240
Source Code 237
Repository 233
Changelog 159
Tracker 150
Issue tracker 131

Full list

Details
project_urls Count
Homepage 4845
Download 738
Documentation 711
Source 400
Bug Tracker 240
Source Code 237
Repository 233
Changelog 159
Tracker 150
Issue tracker 131
Twitter 83
Issue Tracker 79
Changes 67
Chat 62
Funding 59
GitHub 56
Issues 55
YouTube 44
Slack Chat 43
Bug Reports 42
Code 29
CI 24
Source code 15
User Support 13
Discussions 12
Donate 12
GitHub: issues 12
GitHub: repo 12
Github 11
Release Management 10
homepage 10
Bug-Tracker 9
Docs: RTD 9
Release notes 9
documentation 9
Chat: Gitter 8
Code of Conduct 8
Docs 8
Release Notes 8
Source-Code 8
Tidelift 8
repository 8
Coverage: codecov 7
Donation 7
Say Thanks! 7
CI: GitHub 6
Coverage 6
Gitter 6
Mailing lists 6
Blog 5
CI: GitHub Actions 5
Change Log 5
Home 5
Ko-fi 5
Mailing List 5
changelog 5
Discord 4
PyPI 4
Releases 4
Wiki 4
Bug tracker 3
CI: Travis 3
Examples 3
Forum 3
History 3
Slack 3
Author 2
CI: Github Actions 2
Community 2
Continuous Integration 2
Docs: Changelog 2
Download RPMs 2
Downloads 2
GitHub Project 2
GitHub: discussions 2
Home Page 2
Home-page 2
News 2
Red Team Report 2
Sources 2
Telegram Channel 2
Telegram Chat 2
Test Coverage 2
Tests 2
Tidelift: funding 2
Website 2
.git 1
Benchmarks 1
Browse Source 1
Bug Reporting 1
Bug_Tracker 1
Bugs 1
CI/CD 1
CI: AppVeyor 1
CI: Azure Pipelines 1
CI: Circle 1
CI: CircleCI 1
CI: GA 1
CI: Shippable 1
Censys Homepage 1
Censys Search 1
Change log 1
CircleCI 1
Citation 1
Code Coverage 1
Codecov 1
Commercial License 1
Company 1
Conda-Forge 1
Contact 1
Container Image: DockerHub 1
Contribute! 1
Coverage: Codecov 1
Discord Server 1
Discord server 1
Discussion forum 1
Distribution 1
Docs: Contributing 1
Docs: Dev 1
Docs: Intro 1
Docs: Technical Reference 1
Docs: User Guide 1
Documentation-latest 1
Documentation-stable 1
End-User License Agreement 1
Enterprise Support 1
Example Report 1
Feedstock 1
Further Documentation 1
Git Clone URL 1
GitHub repository 1
Github repo 1
Help/Questions 1
Installation 1
License Texts 1
Live demo 1
Mailing list 1
Maillist 1
Matrix Profile Foundation 1
Notebook Examples 1
Online Demo 1
Packaging tutorial 1
Panel Examples 1
Parent Project 1
PyPi 1
Q & A 1
RDKit 1
RDKit on Github 1
Read the Docs 1
Reference 1
Released Versions 1
Report Issues 1
Reviews 1
Samples 1
SonarCloud 1
Sponsor 1
Style guide 1
Support 1
Travis CI 1
Tutorials 1
Webpage 1
What's New 1
Wikipedia 1
Youtube 1
all files 1
blog 1
bugs 1
conda 1
conda-forge 1
download 1
funding 1
github 1
github wiki(under development) 1
gitlab 1
help 1
just a chat to talk about python 1
made possible by 1
os_sys homepage 1
os_sys online 1
read the docs 1
server documentation 1
source 1
startpage 1
tracker 1
want to help 1

Projects with project_urls: 4902/5000

Groups

And grouping some variants, we can see some popular choices:

Homepage

project_urls Count
Homepage 4845
homepage 10
Home 5
Home Page 2
Home-page 2
Website 2
Censys Homepage 1
os_sys homepage 1
startpage 1
Webpage 1

Download

project_urls Count
Download 738
Download RPMs 2
Downloads 2
download 1

Documentation

project_urls Count
Documentation 711
Docs: RTD 9
documentation 9
Docs 8
Docs: Contributing 1
Docs: Dev 1
Docs: Intro 1
Docs: Technical Reference 1
Docs: User Guide 1
Documentation-latest 1
Documentation-stable 1
Further Documentation 1
Read the Docs 1
read the docs 1
server documentation 1

Source

project_urls Count
Source 400
Source Code 237
Repository 233
GitHub 56
Code 29
Source code 15
GitHub: repo 12
Github 11
Source-Code 8
repository 8
Sources 2
Browse Source 1
source 1
.git 1
github 1
github wiki(under development) 1
gitlab 1
Git Clone URL 1
GitHub repository 1
Github repo 1
RDKit on Github 1
all files 1

Bug Tracker

project_urls Count
Bug Tracker 240
Tracker 150
Issue tracker 131
Issue Tracker 79
Issues 55
Bug Reports 42
User Support 13
GitHub: issues 12
Bug-Tracker 9
Bug Reporting 1
Bug_Tracker 1
Bug tracker 3
tracker 1
Bugs 1
bugs 1
help 1
Report Issues 1

Changelog

project_urls Count
Changelog 159
Changes 67
Release Management 10
Release notes 9
Release Notes 8
changelog 5
Change Log 5
Releases 4
History 3
Docs: Changelog 2
Change log 1
Released Versions 1
What's New 1

Chat

project_urls Count
Chat 62
Slack Chat 43
Discussions 12
Gitter 6
Chat: Gitter 8
Discord 4
Forum 3
Slack 3
GitHub: discussions 2
Community 2
Telegram Channel 2
Telegram Chat 2
Discord Server 1
Discord server 1
Discussion forum 1
just a chat to talk about python 1

Funding

project_urls Count
Funding 59
Donate 12
Tidelift 8
Donation 7
Ko-fi 5
Tidelift: funding 2
funding 1
Sponsor 1

CI

project_urls Count
CI 24
CI: GitHub 6
CI: GitHub Actions 5
CI: Github Actions 2
Continuous Integration 2
CI: Travis 3
CI/CD 1
CI: AppVeyor 1
CI: Azure Pipelines 1
CI: Circle 1
CI: CircleCI 1
CI: GA 1
CI: Shippable 1
CircleCI 1
Travis CI 1

hugovk avatar Jul 20 '22 10:07 hugovk