git-bug
git-bug copied to clipboard
problems importing from github.com/nixos/nixpkgs
Ten minutes ago, trying to git bug bridge pull
the nixpkgs project's bugs (>5k bugs, >3k PRs -- a very very large set of bugs):
import error: Something went wrong while executing your query. Please include `A8DE:54BE:87BAE:9F6E7:62140406` when reporting this issue.
I have never been able to get a pull of nixpkgs' bugs to complete, and I've been trying for weeks now (see also #740).
I am using 05d73e1b5321c97cd05133b5ae49d1798bc2fe5d, which has the fix for #585.
Is there any chance that git-bug could keep fetching other bugs when it encounters a problem like this? It appears that it just aborts as soon as any API call produces an error. The nixpkgs bugset is ginormous, so it's going to take me at least a week (due to ratelimits) anyways. But right now it keeps failing after 108 bugs out of 5000+.
Is there an environment variable I can set to cause git-bug to print a backtrace or other context when it gives up like this?
import error: Something went wrong while executing your query. Please include `96F8:8D83:295160:2A2B42:6214AE3E` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D1D4:02E4:B8460B:160070D:6214BB7A` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D25A:179B:23FA97:64C0C7:6214BD6F` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `B1D0:CA69:188D36:19C1B9:6214CAB9` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `DC06:4AA8:86762:96A45:6214D808` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `BCAC:8D83:653F02:675761:6214DA0A` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CA7A:6EAD:9E8FE8:A15B35:6214E738` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `D704:58D9:193918:19A2BB:6214F484` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9C32:41F8:BF7CB1:172FF12:621501C1` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9C86:62D0:471B40:B08239:621503C8` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `EB3C:8E88:466FFE:47B0D0:621510E9` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A71A:5B83:1C51E:27EA5:62151E39` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A7B8:4750:A7461:B3EE1:62152057` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9332:81CD:3B1F92:3CC6F8:62152D6C` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CC06:2824:17F801:56B3B8:62153AAA` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `A6E8:CEC2:8A2504:8CDFBB:621549CE` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `CD30:3456:5C995B:5F2A3F:6215571F` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `9E18:0AF2:CF4F53:D8D530:62156454` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `EA86:4003:1E512C:681416:62156642` when reporting this issue.
imported 0 issues and 0 identities with default bridge
import error: Something went wrong while executing your query. Please include `E05E:13CE1:1049D1D:108CD2E:62157381` when reporting this issue.
The error is from the github server. But I can tell you what I would try:
-
You should put the nixpkg repo in a tmpfs (
mount -t tmpfs
) because there is a big number of issues in nixpkg and git-bug writes more data than necessary on your disk while importing. Using a tmpfs might help you to speed up the import. -
You can try to decrease the size of the queries to the github server here: https://github.com/MichaelMure/git-bug/blob/b2f0e126a10b5f030bfc35cac4b0cabcf083b589/bridge/github/import_mediator.go#L12-L15 These are the number of items requested from the server with one query. You could start by reducing
NumIssues
. If it doesn't work, try to reduce the other ones, too. This will slow the import down, but it might enable you to get a full import without error. If you have success with this method, please let me know which numbers you used. -
You can increase the number of retries in case the github server has an error: https://github.com/MichaelMure/git-bug/blob/b2f0e126a10b5f030bfc35cac4b0cabcf083b589/bridge/github/client.go#L110
I hope that helps. Let us know how it is going.
git-bug writes more data than necessary on your disk while importing.
We should adress that at some point, any idea why that is?
We should adress that at some point, any idea why that is?
Do you remember that discussion: https://github.com/MichaelMure/git-bug/pull/585#issuecomment-799393989. As far as I understand .git/git-bug/bug-cache
and .git/git-bug/identity-cache
are rewritten after every imported issue, comment, edit, ... .
Ha yes, I forgot about that, thanks.
The error is from the github server. But I can tell you what I would try:
Hey, thank you for your reply. Just to clarify:
- You should put the nixpkg repo in a tmpfs
Oh, the reason it's slow is not git-bug; it's github's API ratelimit. I did not mean to imply that git-bug was not performant.
- You can try to decrease the size of the queries to the github server here:
- You can increase the number of retries in case the github server has an error:
Thanks, I will give those a try tonight or tomorrow.
Thanks, I will give those a try tonight or tomorrow.
Testing now.
Is there any way to get git-bug bridge pull
to print verbosely what it is doing? Right now it only prints to the console when it discovers new data. There is a long silent delay at the start of the pull where it isn't clear what it's doing.
Hrm, with latest HEAD and NumIssues=10
I get:
rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
rate limiting: Github GraphQL API rate limit. This process will sleep until 2022-03-01 04:12:20 +0000 UTC.
import error: API rate limit exceeded for user ID [redacted].
Apparently git-bug
's API ratelimit calculation disagrees with github's?
NumIssues=20
did not solve the problem.
2. You can try to decrease the size of the queries to the github server here:
Decreasing these (as low as 20) did not help.
3. You can increase the number of retries in case the github server has an error:
Increasing this (as high as 10) did not help.
Is there any way to get more information about what git-bug was in the middle of doing when the error happened? Like a stack trace maybe?
I will have look at it, but I will need a few days. In case I do not come back to this issue in about a week, please feel free to ping me.
@rng-dynamics it might make sense to implement another ImportResult
/ExportResult
for simple logs that the bridges could use to spit out more detailed information on what's happening. Those could be made visible with a --verbose
flag or something like that.
At this point it would make sense to rename those to importEvent
/ExportEvent
.
@a-m-joseph, thanks for the bug report. The errors are seemingly arbitrary errors by github. If you want you can use my draft in #760. Even if there are github errors with #760 the bridge will continue importing the other issues. And you can run the import again and it will try to fetch the remaining issues. Have look at the file git-bug-import
after running the import. The content of the file should be self-explanatory.
Thanks, I'm trying this right now.
It just finished. I'll investigate the results tomorrow (but wow, the issue count looks correct!)
...
new issue: e0612737cebde57cb49b3f596ce045235a260258f70037d0f9860759ceba3e04
changed label: 682fcc468648af894d56930f714d36124db5ad5348e250ddb6ad42ce1b5b7bd0
import error: context deadline exceeded
imported 24645 issues and 6644 identities with default bridge
Appears to be working!
Thank you so much for taking the time to look into this.
I'm very sorry to trouble you again, I really appreciate the time you've put into this already.
Unfortunately after taking a closer look at the results of the import, I have the correct number of bugs, but:
- After the initial import,
git bug bridge pull
(with or without-n
) no longer picks up new changes. Three days after the initial import (with plenty of new bugs opened in the interim):
$ ../gitbug/git-bug bridge pull
imported 0 issues and 0 identities with default bridge
$ ../gitbug/git-bug bridge pull -n
imported 0 issues and 0 identities with default bridge
- Even the initial import seems to be missing a lot of activity. Strangely, both
git bug ls --by edit
andgit-bug ls --by creation
show the same bug as being most-recent
$ git-bug ls --by creation | tail
52ba089 open Home Assistant module: Use Postgresql ◼ 21stce (21stce)
5c52993 open nixos manual: missing warning about firmware ◼ Björn Gohla (b… 4
6a9acf2 open qutebrowser.aarch64-linux broken due to pyqt5 n… ◼ Collin Arnett …
9095ca7 open mate-utils pulls inkscape to the system ◼ ilya-fedin (il… 10
9d6cf34 open Keepmenu ◼ Stefan Machmei… 1
fdd78d6 open python3: Enabling optimisations as documented… ◼ ◼ David Nadlinge…
5a037d8 open Slack fails to install on Mac OS Monterey ◼ Tyler Levine (…
e061273 open G'Mic missing from Krita ◼ Aidan Gauland …
9241e1b open CUPS web interface hangs at 'add printer' 4/5… ◼ ◼ Alain Zscheile…
66a48f2 open qt5base: apple silicon: fails to build ◼ ◼ ◼ Matthew Leach …
$ git-bug ls --by edit | tail
52ba089 open Home Assistant module: Use Postgresql ◼ 21stce (21stce)
5c52993 open nixos manual: missing warning about firmware ◼ Björn Gohla (b… 4
6a9acf2 open qutebrowser.aarch64-linux broken due to pyqt5 n… ◼ Collin Arnett …
9095ca7 open mate-utils pulls inkscape to the system ◼ ilya-fedin (il… 10
9d6cf34 open Keepmenu ◼ Stefan Machmei… 1
fdd78d6 open python3: Enabling optimisations as documented… ◼ ◼ David Nadlinge…
5a037d8 open Slack fails to install on Mac OS Monterey ◼ Tyler Levine (…
e061273 open G'Mic missing from Krita ◼ Aidan Gauland …
9241e1b open CUPS web interface hangs at 'add printer' 4/5… ◼ ◼ Alain Zscheile…
66a48f2 open qt5base: apple silicon: fails to build ◼ ◼ ◼ Matthew Leach …
The 66a48f2
bug in my import is this one; not sure if that helps.
Even weirder, the bridge is importing dates correctly, but they don't seem to be used when sorting! For example, git bug ls
considers 66a48f2
to be newest, yet there are plenty of bugs with larger create_time
and edit_time
values:
$ git-bug show --format=json 66a48f2 | grep -A1 time
"create_time": {
"timestamp": 1626299686,
"time": "2021-07-14T14:54:46-07:00"
},
"edit_time": {
"timestamp": 1626328489,
"time": "2021-07-14T22:54:49-07:00"
},
$ git-bug show --format=json 52ccab0 | grep -A1 time
"create_time": {
"timestamp": 1646198867,
"time": "2022-03-01T21:27:47-08:00"
},
"edit_time": {
"timestamp": 1646969884,
"time": "2022-03-10T19:38:04-08:00"
},
Again, I'm really sorry to pester you over this, I feel like I've already really pushed the boundaries of your generosity with your time. Unfortunately I don't know go, so debugging this myself is not really feasible.
In the event that this isn't something you can spend any more time on (which I completely understand), do you consider git-bug's repository format to be stable enough that it is okay for people to write bridges which aren't distributed as part of the git-bug codebase? I can probably spare enough time to write a really good github importer, one that also imports pull requests (which github apparently treats as a special kind of bug). Unfortunately I can't really take on learning a whole new programming language and ecosystem right now, that's a much larger time commitment.
Regarding the github bridge: The Github API is quite fragile and we are trying to keep up with its repeatedly changing quirks. The code which you used is only a draft merge request. Anyway, for my own convenience I used a file git-pug-import
to keep track of the progress. If that file is present, then the bridge will import only issues which are listed in that file. If you want to import all issues, then you should just delete the git-bug-import
file. Then the bridge will import all issues and it will write a new file. Of course you can also write your own git-bug-import
file. E.g., if you want to import issues 23, 45, and 65, you should create the file git-bug-import
with the following content and start the import.
# file: git-bug-import
23
45
65
The bridge will update the file according to the import status. After the import the file might look as follows.
# file: git-bug-import
# 23 # imported # Mon 01 Jan 07:03:25 2022
# 45 # imported # Mon 01 Jan 07:03:42 2022
65 # import error # Mon 01 Jan 07:03:56 2022
If you would run the import again, it would read the file again and it would try only to import issue 65, which has failed in the previous import.
The other problems which you describe are probably not caused by the importer but some other component in git-bug.
I would actually really appreciate input on how to improve the importer. My main headaches with the importer are (1) the fragility and changing error-behaviour of the GitHub GraphQL API, and (2) the user interface/interaction in case of errors during the import. How could we get closer to that really good importer?
Regarding the github bridge: The Github API is quite fragile and we are trying to keep up with its repeatedly changing quirks.
My main headaches with the importer are (1) the fragility and changing error-behaviour of the GitHub GraphQL API
Thank you for explaining this.
Personally, I much prefer the sort of workflow the Linux kernel uses, where inter-developer communication uses simple protocols and the developers themselves select the complex tools that suit them best.
Whenever I bring this up in discussions, people always come back with "But GitHub has a usable and reliable API that you can use if you want that!" I've long suspected that this was not, in fact, true. I appreciate your confirmation of this, as someone who has worked on a major integration project with this API.
How could we get closer to that really good importer?
Mainly I would add several levels of debugging output so I can see what's failing, including: (1) a "here's what I'm doing" log and (2) a wire-protocol dump (interleaved with (1))
Anyone know if it's still an issue?
Anyone know if it's still an issue?
I don't think that "hope the problem goes away" is really a solution. The problem still exists at 70bd7377b6362127794f3a6198dd2c63863025fc.
$ ../gitbug/git-bug bridge pull
import error: non-200 OK status code: 502 Bad Gateway body: "{\n \"data\": null,\n
\"errors\":[\n {\n \"message\":\"Something went wrong while executing your
query. This may be the result of a timeout, or it could be a GitHub bug. Please include
`AD12:087E:226594A:22E4D90:637D629A` when reporting this issue.\"\n }\n ]\n}\n"
Note that the error above is a github error. Clearly they are trying to say "we need you to rephrase your query, but we are not interested in giving you any hints about how to do that. neener neener."
I guess "stop using github for large projects" is really the only viable solution to the fact that github's api does not scale, nor does github care at all about that fact.
It looks like a temporary failure in the github side (like a burst of request causing a CPU overload, cascading into failing to handle the request) to me. Basically the P99999 that is both hard to track down and fix for cloud engineers.
git-bug is doing so much request in that situation that it increase the likelyhood of that happening. The problem is that instead of retrying, git-bug fail entirely. We need to both:
- have better retry for this, maybe only fully failing if multiple of those errors occurs, with a gradually increasing delay between retrys
- have better "resume" mechanism
It looks like a temporary failure
I assure you, it is not temporary. I left it running for several days in a loop a while back and it never finished.
I meant that it's a transient failure on github side, meaning that the exact same request would succeed later, meaning that git-bug is doing a valid request.
The problem is in how we handle those random failure.
Not if some aspect of the failure is really an undocumented ratelimit/cpulimit.
I strongly suspect that is the case.