arch-security-tracker icon indicating copy to clipboard operation
arch-security-tracker copied to clipboard

Access to ArchLinux vulnerability feed blocked due to IP restriction

Open Damian-Mangold opened this issue 5 months ago • 4 comments

Description

We have detected that our IP address is currently blocked from accessing the ArchLinux security domain: https://security.archlinux.org/.

We generate a custom vulnerability feed for ArchLinux based on two sources:

  • The official JSON feed (/issues/all.json)
  • Additional details (such as vulnerability descriptions and publication dates) scraped from individual pages on your website.

As of today, we are unable to connect to the site. Below is a sample error:

curl -v -k https://security.archlinux.org/issues/all.json
* Trying 95.217.239.55:443...
* connect to 95.217.239.55 port 443 failed: Connection refused
* Trying 2a01:4f9:c010:aa84::1:443...
* Immediate connect fail for 2a01:4f9:c010:aa84::1: Network is unreachable
* Failed to connect to security.archlinux.org port 443 after 197 ms: Connection refused

Questions

To better understand and resolve this situation, we would like to ask:

  • How long does the IP block typically last?
  • Are there any access restrictions or rate limits we should be aware of? We are willing to adjust our request frequency or behavior if needed.

Expected

  • Receive clarification on block duration and rate limiting policies.
  • Obtain guidance or contact information to restore access.
  • Adjust data retrieval logic to comply with your access policies and avoid future blocks.

Damian-Mangold avatar Jun 30 '25 14:06 Damian-Mangold

Our nginx rate limits are documented here. The ban duration is as far as I know one day, I could unban the ips for now.

However I noticed that every CVE is requested individually? That doesn't seems to scale nor are we the canonical source for CVE's so maybe you want to use something else?

jelly avatar Jun 30 '25 15:06 jelly

Hi @jelly. Thank you very much for the information provided.

However I noticed that every CVE is requested individually?

Yes, that’s correct, but we have a local cache mechanism in place. Each CVE is requested only once and then stored locally. We only fetch information for newly published CVEs.

That doesn't seem to scale nor are we the canonical source for CVEs so maybe you want to use something else?

If you could recommend a more scalable source where we can retrieve CVE descriptions, dates and metrics, we would greatly appreciate it.

Damian-Mangold avatar Jun 30 '25 17:06 Damian-Mangold

Hi @jelly. Thank you very much for the information provided.

However I noticed that every CVE is requested individually?

Yes, that’s correct, but we have a local cache mechanism in place. Each CVE is requested only once and then stored locally. We only fetch information for newly published CVEs.

That doesn't seem to scale nor are we the canonical source for CVEs so maybe you want to use something else?

If you could recommend a more scalable source where we can retrieve CVE descriptions, dates and metrics, we would greatly appreciate it.

Can you provide us a bit more context why you'd want to scrape the CVE data from us, and not from other first-level providers?

If we can understand what you are trying to actually do, like a user story, that would be very helpful to us to reason about the solution here. If your business needs this, we can guide you through implementation requirements for a pull-request that wound provide an acceptable API for both sides.

anthraxx avatar Jun 30 '25 19:06 anthraxx

Hi @anthraxx.

Can you provide us a bit more context why you'd want to scrape the CVE data from us, and not from other first-level providers?

Because we generate a dedicated vulnerability feed for each vendor using their specific information. Our goal is to preserve the original details provided by each vendor.


If we can understand what you are trying to actually do, like a user story...

Given that each vendor provides vulnerability data in different formats (e.g., OVAL, VEX, JSON, etc.), what we do is collect the vulnerability content from each vendor and normalize it into a common format — specifically, the CVE v5 format.

To generate the ArchLinux vulnerability feed in CVE v5 format, we start with the JSON feed provided by ArchLinux. However, this JSON lacks some key information we require, such as descriptions, references, publication and update dates, metrics, and so on. To complete this missing information, we retrieve it from the ArchLinux security website.

Here’s a concrete example: suppose we want to generate the CVE v5 entry for CVE-2021-21224. The JSON feed entry from ArchLinux looks like this:

{
  "name": "AVG-1858",
  "packages": ["vivaldi"],
  "status": "Fixed",
  "severity": "High",
  "type": "arbitrary code execution",
  "affected": "3.7.2218.55-1",
  "fixed": "3.7.2218.58-1",
  "ticket": null,
  "issues": ["CVE-2021-21224"],
  "advisories": []
},
{
  "name": "AVG-1843",
  "packages": [
    "chromium"
  ],
  "status": "Fixed",
  "severity": "High",
  "type": "multiple issues",
  "affected": "90.0.4430.72-2",
  "fixed": "90.0.4430.85-1",
  "ticket": null,
  "issues": [
    "CVE-2021-21226",
    "CVE-2021-21225",
    "CVE-2021-21224",
    "CVE-2021-21223",
    "CVE-2021-21222"
  ],
  "advisories": [
    "ASA-202104-7"
  ]
}

From this we can extract the affected package and version range, but it lacks the rest of the contextual data — such as description, references, publication/update dates, and CVSS metrics. To retrieve that missing context, we currently parse the web pages:

It would be extremely helpful if this extended information were available directly in the JSON feed, or through an API that exposes the same data as shown on the website. This would allow us to fully automate the enrichment process without relying on scraping.

Damian-Mangold avatar Jul 01 '25 10:07 Damian-Mangold