CI cache currently not working
Quick heads up: we're having some trouble with our github cache. The mirror itself is working, but github can't connect to it. We saw some other IPv6 problems from microsoft this week so it may be related. It's being worked on but I don't have an ETA or anything!
We've got a v4.cveb.in set up that should route correctly, but it turns out --use-mirror is broken (it doesn't work no matter what url I give it) so we'll need to fix that before I can test it out.
Okay, so --use-mirror uses the old mirror system which we... probably don't even need to support any more. But I'm having some other problems with getting it switches so --use-mirror passes the url through to the json-mirror correctly. So in the meantime I've put up #5171 which should let the cache job run directly against api2 for now. After CI is done with it and it merges, we'll see if that gets us a usable cache in github actions again so I can keep debugging more easily.
Is this the cause of this?
We used to see timeouts in the last couple of days, now we see this gZip error.
(and of course thank you for your work making this possible)
Yes, I think the gzip error is related (and I've seen it too). Unfortunately, the CI linters are currently blocking me from doing anything because I can't actually merge code until they pass because some of them are marked as required (and I don't have the ability to change those directly) so I'm chasing down that problem first.
Okay, I should have a temporary workaround for the CI runners problem shortly. That won't necessarily fix the cache but it at least means it's be possible to merge stuff so we can fix the cache, so it's forwards progress!
Runners are back up, required jobs are off and staying off in case of more issues while we work the cache problem.
NVD threw an 403 error in the cache job that I ran, not sure if that's because the API key wasn't passed through correctly or what yet.
okay, I can duplicate the 403 error even when i'm sure the api key is set. Probably need to dig into the API docs but I need to step away from the computer for a while. If anyone else wants to keep investigating while I'm offline:
- The relevant docs are probably these ones https://nvd.nist.gov/developers/vulnerabilities
- The function that's failing is
nvd_count_metadatainnvd_api.pyand just handling the ClientResponseError and letting it set the counts to 0 causes different problems so no lazy fix there.
I still haven't dug into the gzip issue, but if I had to guess the mirror got bad data the same way we're getting bad data with api2.
Okay, so the gz error is because for some probably terrible reason, we've got a bunch of files on the mirror that are just "this file is unavailable" request code stuff, and our "what years should we parse?" code just looks at if the files are there.
Currently we're not actually getting data from any year before 2012, but the mirror is trying to store data back to 2002. We should be able to change things at the mirror level and potentially get a workaround, but what I tried first is leaving me with a corrupted database so it might take a while.
Quick summary:
- github actions cannot talk to mirror.cveb.in at the moment due to a DNS issue. We think switching to v4.mirror.cveb.in will solve this but can't test it because...
-
--use-mirrorcan't handle the current mirror set up. I have a lazy fix for this but can't really test it because... - The cve1.1 data that we use on the mirror currently has 10 years worth of files that aren't actually json and instead are html error codes with a binary blob at the end or something. Since most of these files are so old that they seldom get significant updates, so we're intending to roll back to the last valid version we have so the mirror still provides data. But this may only be a temporary workaround because...
- The cve2.0 data which seems to still have valid files for 2002-2012 provided by NVD, so we may need to switch to using that ( see #5172 ).
- So our Github Actions CI can't run because it has no data and the cache is outdated, and even our linters weren't running earlier this week because of an unrelated issue (that part has been resolved, so we're linting again at least!)
I'm currently working on a lazy workaround for "don't try to process broken gzip files" in #5173 so I'll probably keep messing with that until the mirror's happier. Once it's happier I'll switch over to fixing --use-mirror and then hopefully finding a workaround for the DNS problem. But it's all going to be more than a little slow for a few more days, sorry everyone!
In theory we're at the point where the cache should be able to update using the new ipv4 address. But what actually happened is that the job stopped after 14 minutes even though the timeout is set to 60 so I really hope we're not finding a new way to run out of memory or something. Re-running with debug on now.
Well, disabling OSV in the cache has solved the problem, yay the cache is updated (minus OSV)! In theory everything except OSV-related tests should be able to run for the moment.
Next up:
- fixing
--use-mirrorto work as expected so we're not all stuck on ipv4 just to make CI work - someone is already working out a way to let github know they have an ipv6 routing problem (but if any of you want to file a ticket, feel free -- they're more likely to fix if it seems to affect more people)
- figuring out how to fix osv (I think there's a PR I need to investigate that got side-lined)
But I'm going to go have some lunch and may not be back to working on this problem until tomorrow, so I'm going to leave this open in case anyone needs to let me know things aren't working as expected.