AppImageUpdate icon indicating copy to clipboard operation
AppImageUpdate copied to clipboard

Make AppImage update tools aware of GitHub's rate limit

Open simoniz0r opened this issue 6 years ago • 67 comments

Continued from https://github.com/AppImage/AppImageKit/issues/653#issuecomment-380662974

... the AppImage update tools should be aware of GitHub's rate limit and possibly even allow users to login for checks made to GitHub's API if they are making use of GitHub's API. Only 60 checks per hour are allowed without authentication (this goes up to 5000 per hour with authentication); after you hit the limit, you just get an error response. A user could very easily hit that limit and even possibly raise some flags on GitHub's end if they continue to try to connect after hitting that limit.

https://developer.github.com/v3/rate_limit/

simoniz0r avatar Apr 12 '18 12:04 simoniz0r

@simoniz0r you're right, this might be an issue, I recently hit the rate limit myself a few times with an other application.

I'd rather consider using web scraping to work around these restrictions, though. GitHub doesn't need to track update checks, IMO. Also, people shouldn't have to login with GitHub just to check for updates...

TheAssassin avatar Apr 12 '18 16:04 TheAssassin

May be we could use the developers github token ? But keeping it safe is another big deal... Webscraping is a good idea but it can put a lot of development overhead.

antony-jr avatar Apr 12 '18 18:04 antony-jr

But keeping it safe is another big deal

How would you keep it "safe"? And, more importantly, imagine if 2500 users use that token to perform 1 update check, the rate limit is triggered...

TheAssassin avatar Apr 12 '18 19:04 TheAssassin

I've personally found web scraping to be much, much less reliable than using GitHub's API. Authentication is pretty easy to do and completely mitigates this issue; no average user is going to hit the rate limit at 5000 checks per hour.

May be we could use the developers github token ? But keeping it safe is another big deal... Webscraping is a good idea but it can put a lot of development overhead.

That's not gonna work. The token will keep the rate limit with it, so all users who use AppImage update tools would only be able to do 5000 checks per hour total and not 5000 checks per hour individually as they would if they authenticated with their own credentials.

simoniz0r avatar Apr 12 '18 19:04 simoniz0r

The first thing I'd do here is to make AppImageUpdate "aware" of the rate limit by evaluating API responses and checking whether the rate limit has been hit (showing an error with a message).

We could think about using some environment variable like GITHUB_TOKEN to allow users to pass the token to the tool. Then we can think about further measures.

TheAssassin avatar Apr 12 '18 20:04 TheAssassin

We could think about using some environment variable like GITHUB_TOKEN to allow users to pass the token to the tool. Then we can think about further measures.

This seriously makes the end-user suffer a lot. I guess web-scraping is the best choice here because github does not update the site very often.

antony-jr avatar Apr 12 '18 20:04 antony-jr

For a end-user , He/She must create a github account and get a token just for checking updates ? I would rather download the new version directly from the github releases myself.

antony-jr avatar Apr 12 '18 20:04 antony-jr

@antony-jr you got me wrong there, the requests are of course continued to be made anonymously first. The token will only be used once the rate limit is hit.

Anything else would break with my privacy philosophies. I am not convinced that a token that does indeed solve the issue is a good idea privacy wise, but I would agree to optionally offer such a solution for now, to help users affected by the bug right now.

Of course it doesn't make sense to always require a token. And even if that was a necessity, the UX of setting an environment variable is really... well, you can imagine. As said, we will need to think of alternative ways of asking users to log into GitHub when making update checks.

I'm sure @simoniz0r didn't suggest to require a GitHub token as well. I just imagine @simoniz0r has a lot of AppImages, and after a while, the rate limit is being hit, making future update checks impossible. That might happen even more often in the future, since more and more AppImages are being published every day. Therefore, it's good to discuss the issue now before the majority of users will hit this issue.

TheAssassin avatar Apr 12 '18 20:04 TheAssassin

@TheAssassin Got it :+1:

Note : Even jspm uses the environmental variable to solve this issue.

antony-jr avatar Apr 12 '18 20:04 antony-jr

the UX of setting an environment variable is really... well, you can imagine.

Some Desktop Environments actually allow you to set environment variables for the whole session in an easy to use GUI. LXQt, for example, has a very easy to use option to do this in their session manager tool.

It's really not the greatest solution, but scraping the raw HTML has been much less reliable for me compared to using their API.

simoniz0r avatar Apr 12 '18 20:04 simoniz0r

It's not like I was keen on writing some screen scraping in C++... But on the other hand, imagine a user had 60 AppImages of which >= 50% uses GitHub (not that unlikely, really), being a non-dev, I don't really want to ask them to register with a specific service for update checks.

TheAssassin avatar Apr 12 '18 20:04 TheAssassin

A much nicer alternative would be for AppImageHub to keep track of AppImage versions so that the user doesn't ever have to.

simoniz0r avatar Apr 12 '18 21:04 simoniz0r

A much nicer alternative would be for AppImageHub to keep track of AppImage versions so that the user doesn't ever have to.

@simoniz0r Using version numbers is not a good idea and not all developers wants to register their appimage in the AppImageHub and keep on updating the version info.

antony-jr avatar Apr 12 '18 21:04 antony-jr

@simoniz0r Using version numbers is not a good idea and not all developers wants to register their appimage in the AppImageHub and keep on updating the version info.

Y'all keep saying that, yet the alternatives to AppImages (snaps and flatpaks) have no problem doing this and are much nicer experience to keep up to date.

simoniz0r avatar Apr 12 '18 21:04 simoniz0r

@antony-jr I'd like to add that implementing every kind of versioning that you could imagine is a lot of work.

Y'all keep saying that, yet the alternatives to AppImages (snaps and flatpaks) have no problem doing this and are much nicer experience to keep up to date.

That's not entirely correct, @simoniz0r. Last year, I've been to an open-source conference, where a snap core dev talked about these topics. We've had a long conversation about this topic. In the snap ecosystem, version numbers are merely considered "tags" of a file, they aren't used for anything internally.

TheAssassin avatar Apr 12 '18 21:04 TheAssassin

Well, whatever mechanism they're using for tracking versions is honestly a much nicer experience than trying to keep AppImages up to date.

snaps are honestly my reference for how AppImages should work for the most part. Everything about creating and updating them (with the exception of snapd being required) is much nicer to use than AppImage tools. The main reason that I still prefer AppImages to snaps is that AppImages require no additional software to be used, but it would still be nice to see AppImage adopt some of the ways of handling things that have been working very well for snap.

simoniz0r avatar Apr 12 '18 21:04 simoniz0r

@TheAssassin About keeping the GITHUB_TOKEN Safe , Why can't we use uuid + Timestamp to create a key for a stronger encryption(AES256 or Blowfish) at build time, So that we can encrypt the GITHUB_TOKEN for the user or a developer given one ?

antony-jr avatar May 03 '18 14:05 antony-jr

You mean, to be able to embed a key upstream? That is not an encryption. It's an obfuscation. Anybody can just read the key from the binary. Also, that'll allow tracking of every request AppImageUpdate will ever make. And it isn't scalable, the more users use the key, the faster the rate limit will be hit.

TheAssassin avatar May 03 '18 14:05 TheAssassin

Anybody can just read the key from the binary.

Thats relative. Without debugging information embeded it would be very hard to decompile a binary , that too a binary created with a high level language such as C++. But , as a developer we cannot have assumptions so lets agree that this is a bad idea but we could use 'Public key cryptography'. Anyways lets just forget about this.

And it isn't scalable, the more users use the key, the faster the rate limit will be hit.

I accept with that too..

So another idea hit me, Instead of scraping github , How about we commit the update information into the repo and so we can get the meta-data (.zsync file) with just raw requests without the help of Github API or any user tokens. Just like a dumb web-server.

Important: We should not commit the new binary in the repo , Instead we just need the .zsync file commited with the file header pointing to the new binary at the releases ( We can get the file location after the upload by using the developer's GITHUB_TOKEN on build time which does not run out ). This auto-commit process should take place after the upload of the binary.

So When AppImageUpdate hit a limit , It would just look for the metafile with raw request which could then help us to do the zsync algorithm.

antony-jr avatar May 03 '18 15:05 antony-jr

@antony-jr I don't need to decompile the binary, there's a ton of other ways, e.g., getting a memory dump, etc. This is not secure. Period. But we don't want to embed such a key at all upstream. (And public key cryptography doesn't solve anything here either, you always need a key to decrypt a secret).

Committing reproducible files is a really bad idea in general. Please check the internet, there's tons of articles rejecting the idea.

TheAssassin avatar May 03 '18 15:05 TheAssassin

Committing reproducible files is a really bad idea in general. Please check the internet, there's tons of articles rejecting the idea.

Can you give me pointers ? Why this is a bad idea even if this does not rise any security issues or rate limiting , I think I have to reiterate , We just have to upload the .zsync (metafile) to the repo and so we can get the metafile directly from the repo , just like we retrive a raw file from github.

Build ( Travis-CI )

  • Build the source files.
  • Package with AppImage.
  • Upload the AppImage and the zsync to the releases.
  • Commit only the .zsync file to the repo under a specified directory.

Update

  • Check for updates using github api.
  • If github api fails then search for the .zsync file under the specific directory on that repo.
  • Like so -> https://raw.githubusercontent.com/antony-jr/AppImageUpdater/master/.img/poster.png
  • Now do the syncing.
  • Note that the .zsync file commited to the repo has a modified url field directly pointing to the binary at the releases.

antony-jr avatar May 03 '18 15:05 antony-jr

Are we discussing a theoretical problem here? I’ve never ran into any rate limits as a user, as they are per-IP, right?

probonopd avatar May 03 '18 16:05 probonopd

Are we discussing a theoretical problem here? I’ve never ran into any rate limits as a user, as they are per-IP, right?

@probonopd Yes , rate limiting is a real problem. Its not a big problem for now since AppImages are less in count for now , But Imagine a ton of software's getting updated simultaneously , The rate limiting will activate and everything falls apart , And please note that there is a very good chance that a lot of AppImages might use github as the development platform.

If you want to produce this bug , Try @simoniz0r 's tool -> https://github.com/simoniz0r/spm or use the AppImageUpdate with a large amount of AppImages , Also its not rare that a user might want to update a ton of software.( Like we do now with a package manager).

@TheAssassin Another way to solve this is to use the dumb link to the zsync file (The first transport method from the specification).

antony-jr avatar May 03 '18 17:05 antony-jr

I have ran into rate limit problems many times personally.

simoniz0r avatar May 03 '18 17:05 simoniz0r

@probonopd obviously @simoniz0r hit the rate limit, so, not as theoretical as you might imagine. With app stores like the NX Software center, those checks are performed a lot more frequently, too. We need a fix in the near future.

TheAssassin avatar May 03 '18 17:05 TheAssassin

@TheAssassin Remember the dumb link to a zsync file. In my observation , the normal zsync update method and github update method is similar but the wild card is the only difference , So with some luck we could transform github update information to a dumb zsync update method. So I think this has to get into the AppImage specification.

Example

gh-releases-zsync|probono|AppImages|latest|Subsurface-*x86_64.AppImage.zsync can be transformed into -> zsync|https://github.com/probono/AppImages/releases/download/latest/Subsurface-*x86_64.AppImage.zsync

But here we face the problem with the wild card and if so then , The only way is webscraping. ( To the best of my knowledge.)

antony-jr avatar May 03 '18 18:05 antony-jr

@antony-jr check out what appimagetool does. I think it still uses this method. As you correctly recognized, this only works with files with static names, like appimagetool or linuxdeployqt.

TheAssassin avatar May 03 '18 18:05 TheAssassin

@TheAssassin But what if we could do this for github update method as a fail-safe then this problem would never exist. Only if we could get the name of the zsync file. Any thing hits your mind on Wild-Cards and URL's ? If we could answer this question then this problem can be solved without any major modification to the code.

antony-jr avatar May 03 '18 18:05 antony-jr

@antony-jr you can't solve anything with "auto commits". Most authors don't want such stuff. And a dependency on git isn't very nice either.

TheAssassin avatar May 03 '18 18:05 TheAssassin

@antony-jr you can't solve anything with "auto commits". Most authors don't want such stuff. And a dependency on git isn't very nice either.

@TheAssassin Lets agree to disagree , This is the final solution I have...

  • Use github-releases-zsync as the first stage.
  • if the first stage fails , then move on to the first fail-safe.
  • check if the github-releases-zsync uses a static filename without a wild-card.
  • If so then convert the github-releases-zsync to a normal zsync update method.
  • if not then move on to the second fail-safe.
  • Using very simple webscraping get the information needed for the syncing.

This is what I can think of (under my belt).

antony-jr avatar May 03 '18 19:05 antony-jr