pypi-tools
pypi-tools copied to clipboard
Investigation into "canonical" link for a PyPI repo link
Summary: use Source
In addition to url (alias homepage), packages on PyPI can have this metadata:
project_urlsAn arbitrary map of URL names to hyperlinks, allowing more extensible documentation of where various resources can be found than the simple
urlanddownload_urloptions provide.
The url homepage is added into project_urls as homepage. For example, Pillow doesn't use define any project_urls but does have url="http://python-pillow.org",, and https://pypi.org/pypi/Pillow/json includes:
"home_page": "http://python-pillow.org",
"project_urls": {
"Homepage": "http://python-pillow.org"
},
Many projects have a link to their GitHub (or GitLab or Bitbucket etc.) repos as the homepage. For those that include an arbitrary link to a source repo, what is the most common one, when not the homepage?
Checking the current top 5,000 packages, here is the project_url key where a source repo was found (defined as a URL containing one of github.com, gitlab.com, bitbucket.org or bitbucket.com):
Counter({'Homepage': 3711,
None: 1047,
'Source': 95,
'Download': 63,
'Source Code': 38,
'Code': 14,
'Issue Tracker': 5,
'Repository': 5,
'GitHub: issues': 4,
'Github': 3,
'Bug Tracker': 3,
'Bug Reports': 2,
'Issue tracker': 2,
'Source code': 2,
'Twine source': 1,
'Issues': 1,
'Github repo': 1,
'Change log': 1,
'Changelog': 1,
'GitHub': 1})
Some of these are specific things, like links to tarball downloads, or issue trackers. But the most common ones for a repo homepage are Source, Source Code and Code.
- I'll use
Sourcefor adding new ones.
SCM links in projects_urls, preview:
Load data/top-repos.json...
Load data/top-pypi-packages.json...
Already done: 0
Find new repos...
Homepage https://github.com/benjaminp/six
Homepage https://github.com/boto/botocore
Homepage https://github.com/boto/s3transfer
Homepage https://github.com/kjd/idna
Homepage https://github.com/chardet/chardet
Homepage https://github.com/etingof/pyasn1
Homepage https://github.com/yaml/pyyaml
Homepage https://github.com/jmespath/jmespath.py
Homepage https://github.com/pypa/setuptools
Homepage https://github.com/agronholm/pythonfutures
Homepage https://github.com/tartley/colorama
Homepage https://github.com/boto/boto3
Homepage https://github.com/simplejson/simplejson
Source Code https://github.com/numpy/numpy
Homepage https://github.com/pypa/wheel
Download https://github.com/protocolbuffers/protobuf/releases
...
Homepage https://github.com/broadinstitute/keras-resnet
Homepage https://github.com/CyberZHG/keras-position-wise-feed-forward
Homepage https://github.com/makinacorpus/django-admin-watchdog
Old repos: 0
New repos: 3953
Not found: 1047
Counter({'Homepage': 3711,
None: 1047,
'Source': 95,
'Download': 63,
'Source Code': 38,
'Code': 14,
'Issue Tracker': 5,
'Repository': 5,
'GitHub: issues': 4,
'Github': 3,
'Bug Tracker': 3,
'Bug Reports': 2,
'Issue tracker': 2,
'Source code': 2,
'Twine source': 1,
'Issues': 1,
'Github repo': 1,
'Change log': 1,
'Changelog': 1,
'GitHub': 1})
Full list:
The project_urls for each of the top 5,000, preview:
{'Homepage': 'https://urllib3.readthedocs.io/'}
{'Homepage': 'https://github.com/benjaminp/six'}
{'Homepage': 'https://github.com/boto/botocore'}
{'Homepage': 'http://python-requests.org'}
{'Homepage': 'https://dateutil.readthedocs.io'}
{'Homepage': 'https://pip.pypa.io/'}
{'Homepage': 'https://github.com/boto/s3transfer'}
{'Homepage': 'https://certifi.io/'}
{'Homepage': 'https://github.com/kjd/idna'}
{'Homepage': 'http://docutils.sourceforge.net/'}
{'Homepage': 'https://github.com/chardet/chardet'}
{'Homepage': 'https://github.com/etingof/pyasn1'}
{'Download': 'https://pypi.org/project/PyYAML/', 'Homepage': 'https://github.com/yaml/pyyaml'}
{'Homepage': 'https://stuvel.eu/rsa'}
{'Homepage': 'https://github.com/jmespath/jmespath.py'}
{'Documentation': 'https://setuptools.readthedocs.io/', 'Homepage': 'https://github.com/pypa/setuptools'}
{'Download': 'https://pypi.org/project/pytz/', 'Homepage': 'http://pythonhosted.org/pytz'}
{'Homepage': 'https://github.com/agronholm/pythonfutures'}
{'Homepage': 'https://github.com/tartley/colorama'}
{'Homepage': 'http://aws.amazon.com/cli/'}
...
Full list:
Multipart zip of /Users/hugo/Library/Caches/source-finder/ containing the top 5,000 (plus 5) JSON metatdata, created with zip source-finder.zip --out cachefiles.zip -s 10m
Rename the .z0X.zip to .zOX before uncompressing.
And a count of all the project_urls keys:
Counter({'Homepage': 4881,
'Download': 1164,
'Documentation': 238,
'Issue tracker': 116,
'Source': 95,
'Tracker': 41,
'Source Code': 38,
'Bug Tracker': 36,
'Repository': 31,
'Changelog': 28,
'Bug Reports': 25,
'Funding': 18,
'Issues': 15,
'Issue Tracker': 14,
'Code': 14,
'CI: Travis': 9,
'GitHub: issues': 7,
'GitHub: repo': 7,
'Source code': 7,
'CI: AppVeyor': 5,
'Docs: RTD': 5,
'Docs': 4,
'CI: Circle': 4,
'Donation': 4,
'GitHub': 4,
'Chat: Gitter': 3,
'Coverage: codecov': 3,
'Tidelift': 3,
'Github': 3,
'Travis CI': 3,
'Say Thanks!': 3,
'CI: Shippable': 2,
'Website': 2,
'Code of Conduct': 2,
'Mailing lists': 2,
'Change log': 2,
'Release Management': 2,
'Webpage': 2,
'CI': 2,
'PyPI': 1,
'Test Coverage': 1,
'Tests': 1,
'Packaging tutorial': 1,
'Twine documentation': 1,
'Twine source': 1,
'CI: CircleCI': 1,
'Support': 1,
'Benchmarks': 1,
'Wiki': 1,
'Github repo': 1,
'Wikipedia': 1,
'Blog': 1,
'Donate': 1,
'Tidelift Subscription': 1,
'Dev Docs': 1,
'Discord': 1,
'Forum': 1,
'Code Coverage': 1,
'Continuous Integration': 1,
'Mailing List': 1,
'Chat': 1,
'Community': 1,
'Gitter': 1,
'bugs': 1,
'repository': 1,
'Issue Tracking': 1,
'Discord server': 1})
@hugovk, I think https://github.com/jayvdb/pypidb will be helpful. Note the repos are still getting set up, and there is currently a dependency on https://github.com/jayvdb/https-everywhere-py master, which I will fix by getting a new release out within a day or two.
Looks good! Thanks!
August 2020
Updated list of most popular project_uls keys in the top 4,000 downloaded packages (via https://github.com/hugovk/pypi-tools/pull/20#issue-493725680):
$ python3 project_urls.py -n 4000
Load data/top-pypi-packages.json...
Find project_urls...
100%|████████████████████████████████| 4000/4000 [00:07<00:00, 524.71project/s]
Counter({'Homepage': 3916,
'Download': 778,
'Documentation': 240,
'Source': 152,
'Changelog': 70,
'Repository': 63,
'Bug Tracker': 62,
'Source Code': 60,
'Tracker': 55,
'Issue tracker': 39,
'Issue Tracker': 30,
'GitHub': 28,
'Code': 26,
'Issues': 21,
'Funding': 20,
'Bug Reports': 17,
'Bug-Tracker': 8,
'Twitter': 8,
'CI: Travis': 7,
'Source-Code': 7,
'Docs': 6,
'GitHub: issues': 6,
'GitHub: repo': 6,
'Github': 6,
'Source code': 6,
'bugs': 6,
'repository': 6,
'Docs: RTD': 5,
'Donation': 5,
'CI: AppVeyor': 3,
'CI: Circle': 3,
'Chat: Gitter': 3,
'Code of Conduct': 3,
'Coverage: codecov': 3,
'Donate': 3,
'Mailing List': 3,
'Say Thanks!': 3,
'Tidelift': 3,
'Travis CI': 3,
'CI': 2,
'CI: GitHub': 2,
'CI: Shippable': 2,
'Change log': 2,
'Chat': 2,
'Download RPMs': 2,
'Forum': 2,
'Mailing lists': 2,
'Release Management': 2,
'Release notes': 2,
'Tidelift: funding': 2,
'Website': 2,
'Benchmarks': 1,
'Blog': 1,
'Bug tracker': 1,
'Bugs': 1,
'CI: Azure Pipelines': 1,
'CI: CircleCI': 1,
'CI: GitHub Workflows': 1,
'CI: Zuul': 1,
'Code Coverage': 1,
'Commercial License': 1,
'Community': 1,
'Conda-Forge': 1,
'Continuous Integration': 1,
'Coverage': 1,
'Dev Docs': 1,
'Discord': 1,
'Discussions': 1,
'Downloads': 1,
'Examples': 1,
'Feedstock': 1,
'Further Documentation': 1,
'Github repo': 1,
'Help/Questions': 1,
'History': 1,
'License': 1,
'Online Demo': 1,
'Packaging tutorial': 1,
'PyPI': 1,
'Read the Docs': 1,
'Release Notes': 1,
'Releases': 1,
'Support': 1,
'Test Coverage': 1,
'Tests': 1,
'Tutorials': 1,
'Twine documentation': 1,
'Twine source': 1,
"What's New": 1,
'Wiki': 1,
'Wikipedia': 1,
'conda': 1})
Number with project_urls: 3925/4000
June 2022
Updated list of most popular project_uls keys in the top 5,000 downloaded packages:
python3 pypi_fields.py --number 5000 --format markdown
Top 10
| project_urls | Count |
|---|---|
| Homepage | 4845 |
| Download | 738 |
| Documentation | 711 |
| Source | 400 |
| Bug Tracker | 240 |
| Source Code | 237 |
| Repository | 233 |
| Changelog | 159 |
| Tracker | 150 |
| Issue tracker | 131 |
Full list
Details
| project_urls | Count |
|---|---|
| Homepage | 4845 |
| Download | 738 |
| Documentation | 711 |
| Source | 400 |
| Bug Tracker | 240 |
| Source Code | 237 |
| Repository | 233 |
| Changelog | 159 |
| Tracker | 150 |
| Issue tracker | 131 |
| 83 | |
| Issue Tracker | 79 |
| Changes | 67 |
| Chat | 62 |
| Funding | 59 |
| GitHub | 56 |
| Issues | 55 |
| YouTube | 44 |
| Slack Chat | 43 |
| Bug Reports | 42 |
| Code | 29 |
| CI | 24 |
| Source code | 15 |
| User Support | 13 |
| Discussions | 12 |
| Donate | 12 |
| GitHub: issues | 12 |
| GitHub: repo | 12 |
| Github | 11 |
| Release Management | 10 |
| homepage | 10 |
| Bug-Tracker | 9 |
| Docs: RTD | 9 |
| Release notes | 9 |
| documentation | 9 |
| Chat: Gitter | 8 |
| Code of Conduct | 8 |
| Docs | 8 |
| Release Notes | 8 |
| Source-Code | 8 |
| Tidelift | 8 |
| repository | 8 |
| Coverage: codecov | 7 |
| Donation | 7 |
| Say Thanks! | 7 |
| CI: GitHub | 6 |
| Coverage | 6 |
| Gitter | 6 |
| Mailing lists | 6 |
| Blog | 5 |
| CI: GitHub Actions | 5 |
| Change Log | 5 |
| Home | 5 |
| Ko-fi | 5 |
| Mailing List | 5 |
| changelog | 5 |
| Discord | 4 |
| PyPI | 4 |
| Releases | 4 |
| Wiki | 4 |
| Bug tracker | 3 |
| CI: Travis | 3 |
| Examples | 3 |
| Forum | 3 |
| History | 3 |
| Slack | 3 |
| Author | 2 |
| CI: Github Actions | 2 |
| Community | 2 |
| Continuous Integration | 2 |
| Docs: Changelog | 2 |
| Download RPMs | 2 |
| Downloads | 2 |
| GitHub Project | 2 |
| GitHub: discussions | 2 |
| Home Page | 2 |
| Home-page | 2 |
| News | 2 |
| Red Team Report | 2 |
| Sources | 2 |
| Telegram Channel | 2 |
| Telegram Chat | 2 |
| Test Coverage | 2 |
| Tests | 2 |
| Tidelift: funding | 2 |
| Website | 2 |
| .git | 1 |
| Benchmarks | 1 |
| Browse Source | 1 |
| Bug Reporting | 1 |
| Bug_Tracker | 1 |
| Bugs | 1 |
| CI/CD | 1 |
| CI: AppVeyor | 1 |
| CI: Azure Pipelines | 1 |
| CI: Circle | 1 |
| CI: CircleCI | 1 |
| CI: GA | 1 |
| CI: Shippable | 1 |
| Censys Homepage | 1 |
| Censys Search | 1 |
| Change log | 1 |
| CircleCI | 1 |
| Citation | 1 |
| Code Coverage | 1 |
| Codecov | 1 |
| Commercial License | 1 |
| Company | 1 |
| Conda-Forge | 1 |
| Contact | 1 |
| Container Image: DockerHub | 1 |
| Contribute! | 1 |
| Coverage: Codecov | 1 |
| Discord Server | 1 |
| Discord server | 1 |
| Discussion forum | 1 |
| Distribution | 1 |
| Docs: Contributing | 1 |
| Docs: Dev | 1 |
| Docs: Intro | 1 |
| Docs: Technical Reference | 1 |
| Docs: User Guide | 1 |
| Documentation-latest | 1 |
| Documentation-stable | 1 |
| End-User License Agreement | 1 |
| Enterprise Support | 1 |
| Example Report | 1 |
| Feedstock | 1 |
| Further Documentation | 1 |
| Git Clone URL | 1 |
| GitHub repository | 1 |
| Github repo | 1 |
| Help/Questions | 1 |
| Installation | 1 |
| License Texts | 1 |
| Live demo | 1 |
| Mailing list | 1 |
| Maillist | 1 |
| Matrix Profile Foundation | 1 |
| Notebook Examples | 1 |
| Online Demo | 1 |
| Packaging tutorial | 1 |
| Panel Examples | 1 |
| Parent Project | 1 |
| PyPi | 1 |
| Q & A | 1 |
| RDKit | 1 |
| RDKit on Github | 1 |
| Read the Docs | 1 |
| Reference | 1 |
| Released Versions | 1 |
| Report Issues | 1 |
| Reviews | 1 |
| Samples | 1 |
| SonarCloud | 1 |
| Sponsor | 1 |
| Style guide | 1 |
| Support | 1 |
| Travis CI | 1 |
| Tutorials | 1 |
| Webpage | 1 |
| What's New | 1 |
| Wikipedia | 1 |
| Youtube | 1 |
| all files | 1 |
| blog | 1 |
| bugs | 1 |
| conda | 1 |
| conda-forge | 1 |
| download | 1 |
| funding | 1 |
| github | 1 |
| github wiki(under development) | 1 |
| gitlab | 1 |
| help | 1 |
| just a chat to talk about python | 1 |
| made possible by | 1 |
| os_sys homepage | 1 |
| os_sys online | 1 |
| read the docs | 1 |
| server documentation | 1 |
| source | 1 |
| startpage | 1 |
| tracker | 1 |
| want to help | 1 |
Projects with project_urls: 4902/5000
Groups
And grouping some variants, we can see some popular choices:
Homepage
| project_urls | Count |
|---|---|
| Homepage | 4845 |
| homepage | 10 |
| Home | 5 |
| Home Page | 2 |
| Home-page | 2 |
| Website | 2 |
| Censys Homepage | 1 |
| os_sys homepage | 1 |
| startpage | 1 |
| Webpage | 1 |
Download
| project_urls | Count |
|---|---|
| Download | 738 |
| Download RPMs | 2 |
| Downloads | 2 |
| download | 1 |
Documentation
| project_urls | Count |
|---|---|
| Documentation | 711 |
| Docs: RTD | 9 |
| documentation | 9 |
| Docs | 8 |
| Docs: Contributing | 1 |
| Docs: Dev | 1 |
| Docs: Intro | 1 |
| Docs: Technical Reference | 1 |
| Docs: User Guide | 1 |
| Documentation-latest | 1 |
| Documentation-stable | 1 |
| Further Documentation | 1 |
| Read the Docs | 1 |
| read the docs | 1 |
| server documentation | 1 |
Source
| project_urls | Count |
|---|---|
| Source | 400 |
| Source Code | 237 |
| Repository | 233 |
| GitHub | 56 |
| Code | 29 |
| Source code | 15 |
| GitHub: repo | 12 |
| Github | 11 |
| Source-Code | 8 |
| repository | 8 |
| Sources | 2 |
| Browse Source | 1 |
| source | 1 |
| .git | 1 |
| github | 1 |
| github wiki(under development) | 1 |
| gitlab | 1 |
| Git Clone URL | 1 |
| GitHub repository | 1 |
| Github repo | 1 |
| RDKit on Github | 1 |
| all files | 1 |
Bug Tracker
| project_urls | Count |
|---|---|
| Bug Tracker | 240 |
| Tracker | 150 |
| Issue tracker | 131 |
| Issue Tracker | 79 |
| Issues | 55 |
| Bug Reports | 42 |
| User Support | 13 |
| GitHub: issues | 12 |
| Bug-Tracker | 9 |
| Bug Reporting | 1 |
| Bug_Tracker | 1 |
| Bug tracker | 3 |
| tracker | 1 |
| Bugs | 1 |
| bugs | 1 |
| help | 1 |
| Report Issues | 1 |
Changelog
| project_urls | Count |
|---|---|
| Changelog | 159 |
| Changes | 67 |
| Release Management | 10 |
| Release notes | 9 |
| Release Notes | 8 |
| changelog | 5 |
| Change Log | 5 |
| Releases | 4 |
| History | 3 |
| Docs: Changelog | 2 |
| Change log | 1 |
| Released Versions | 1 |
| What's New | 1 |
Chat
| project_urls | Count |
|---|---|
| Chat | 62 |
| Slack Chat | 43 |
| Discussions | 12 |
| Gitter | 6 |
| Chat: Gitter | 8 |
| Discord | 4 |
| Forum | 3 |
| Slack | 3 |
| GitHub: discussions | 2 |
| Community | 2 |
| Telegram Channel | 2 |
| Telegram Chat | 2 |
| Discord Server | 1 |
| Discord server | 1 |
| Discussion forum | 1 |
| just a chat to talk about python | 1 |
Funding
| project_urls | Count |
|---|---|
| Funding | 59 |
| Donate | 12 |
| Tidelift | 8 |
| Donation | 7 |
| Ko-fi | 5 |
| Tidelift: funding | 2 |
| funding | 1 |
| Sponsor | 1 |
CI
| project_urls | Count |
|---|---|
| CI | 24 |
| CI: GitHub | 6 |
| CI: GitHub Actions | 5 |
| CI: Github Actions | 2 |
| Continuous Integration | 2 |
| CI: Travis | 3 |
| CI/CD | 1 |
| CI: AppVeyor | 1 |
| CI: Azure Pipelines | 1 |
| CI: Circle | 1 |
| CI: CircleCI | 1 |
| CI: GA | 1 |
| CI: Shippable | 1 |
| CircleCI | 1 |
| Travis CI | 1 |