package_control icon indicating copy to clipboard operation
package_control copied to clipboard

[four-point-oh] Reduce expensive github API calls

Open deathaxe opened this issue 3 years ago • 1 comments

TL;TR: Is a release's date field used for anything else than determining when a package was modified?

State Of The Art

Currently, client.download_info() makes an API call for each tag/release of a package or library.

https://github.com/wbond/package_control/blob/4e585fb656c5a0da03bd010bb9533fb055c7ab10/package_control/clients/github_client.py#L72-L74

This means theoretically up to 100 API calls per package/library!

Even a dependencies.json repository with least required / most popular libraries causes PC to quickly hit default GitHub API rate limit after a couple of libraries.

... all of that just to delete the date field from releases, finally!

date of most recent release seems used to fill last_modified field of a package however (not library). Am I right it to be required for packagecontrol.io (only)?

Proposal

We could modify providers to explicitly make an API call for the most recent release only (or the branch for branch based releases) to reduce API calls significantly by ommiting implicit API calls for each release, just to determine the date field.

This would limit PC to invoke 2 API calls per package/library. One for the history, one for the date of the most recent release.

It should save some bandwith and improve crawling performance significantly.

deathaxe avatar Aug 15 '22 17:08 deathaxe

Now that I got basic auth working, here are some stats.

A dependencies.json repository with following 7 libs causes 180 API calls at the time writing.

  • backrefs
  • bracex
  • lsp_utils
  • mdpopups
  • pyyaml
  • sublime_lib
  • wcmatch

... just to throw their results away.

deathaxe avatar Aug 15 '22 21:08 deathaxe