python-github-backup icon indicating copy to clipboard operation
python-github-backup copied to clipboard

API request returned HTTP 500: Internal Server Error

Open rshad opened this issue 5 years ago • 4 comments
trafficstars

Issue Description

I'm trying to make a backup of all the repository with all their contents (PRs. Issues, Code, etc ...) " Taking into account, that some repos are huge (with more than 5000 issues for example)" and the binary github-backup fails sometimes with the error:

API request returned HTTP 500: Internal Server Error

I first thought that it's related to the reach the maximum requests rate permitted per hour "5000", but it's not the case; The error was produced when I still had ~ 2500 requests.

I also tried to make the backup of the Issues only, or of the PRs only and no error is produced.

I'm totally convinced, it's related to the huge number of requests being made, but I could not get to know the real reason.

Any ideas?

Kr,

Rshad

rshad avatar Jan 21 '20 16:01 rshad

I've identified some issues with the error handling functions that will cause the script to terminate early when it shouldn't. I'm going to be working on a PR to fix it when I have some time.

The script has some code in it to do an automatic backoff when it hits a rate limiting error, but due to the above mentioned issues with the error handling, I don't believe that part it working correctly.

Though keep in mind that if you're trying to back up 10s of thousands of items or anything much higher than a total of 5000, it will take a very long time due to the rate limiting. I'd suggest breaking it up into multiple calls one every couple hours. For example only backup repos, the only PRs, then only issues, etc. That should at least speed things up a bit. In fact I wrote a script meant to be used with cron to easily allow backing up different things at different times without putting complicated commands into the crontab. I'm still tweaking it, but I'll probably submit it as a PR to include in the repo when I'm done with it.

einsteinx2 avatar Jan 22 '20 02:01 einsteinx2

You can see my open issue to refactor the error handling here for more detailed information or to track the progress: https://github.com/josegonzalez/python-github-backup/issues/138

einsteinx2 avatar Jan 22 '20 02:01 einsteinx2

Hi @einsteinx2 !

Thanks for answering.

Actually I was thinking about the same behavior. Running the backup by category "PRs, Issues, Wikis, .." for each repo of mine.

for category in categories:
    for repo in repositories:
         <backup category> # using github-backup
    sleep(10) # seconds

However, it would still be failing in case, the number of issues or PRs is so large.

I'll give it a try and I'll look at your solution once you get it.

Kr,

Rshad

rshad avatar Jan 22 '20 10:01 rshad

My solution is more or less what you're doing, but using cron to put a >1 hour delay between each run and doing it overnight. That way your 5000 allowed requests get reset for each run.

einsteinx2 avatar Jan 22 '20 18:01 einsteinx2