vulnerablecode
vulnerablecode copied to clipboard
Add glibc advisories
glibc has started publishing its own advisories:
https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories;h=a0872e990274aee4d881508dad1bce3ea49d4d07;hb=HEAD
May I work on this issue I have gotten a rough Idea about how advisories are being imported. Please see @ziadhany and @TG1999
@harsh098 re:
May I work on this issue I have gotten a rough Idea about how advisories are being imported.
sure, please do: do not hesitate to ask questions as you go.
@harsh098 See this commit by @siddhesh for some details: https://sourceware.org/git/?p=glibc.git;a=commit;h=60c57b8467f11e334e7c7fd07d588c248e93d952
Move CVE information into advisories directory
One of the requirements to becoming a CVE Numbering Authority (CNA) is to publish advisories. Do this by maintaining a file for each CVE fixed in the advisories directory in the source tree. Links to the advisories can then be shared as:
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=advisories/GLIBC-SA-YYYY-NNNN
The file format at the moment is rudimentary and derives from the git commit format, i.e. a subject line and a potentially multi-paragraph description and then tags to describe some meta information. This is a loose format at the moment and could change as we evolve this.
Also add a script process-fixed-cves.sh that processes these advisories and generates a list to add to NEWS at release time.
An example of the format is: https://sourceware.org/git/?p=glibc.git;a=blob;f=advisories/GLIBC-SA-2023-0003;h=0950dc0792d94b28703837dce353a8ca5be0d96b;hb=HEAD
@siddhesh just a FYI, we will start consuming these, which means writing a parser for these... If you like, we could help you with adopting some format that makes it easy to consume your advisories!
Sure @pombredanne I'll be asking wherever I get stuck
@siddhesh just a FYI, we will start consuming these, which means writing a parser for these... If you like, we could help you with adopting some format that makes it easy to consume your advisories!
Thanks, I'd love to hear feedback. We started to but then stayed away from format discussions because we agreed on getting the information right first. Whatever feedback you give should help us clean up on both fronts, format as well as data, based on what you typically publish for advisories.
Hi @pombredanne ,
I had problem figuring out the purl for glibc from glibc.
I had a look at
https://github.com/package-url/purl-spec
I came up with
pkg:generic/glibc@version?commit=commit-id
may I proceed with it ?
@siddhesh and @pombredanne
I wrote a sample parser to parse advisories at https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories;h=a0872e990274aee4d881508dad1bce3ea49d4d07;hb=HEAD
Please check the format, this will be adapted to AdvisoryData in final importer
def parse_advisory(file):
"""
THE Given Function Scrapes the following data from the given advisory files at
https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories
1. CVE_ID
DataType: str
2. Vulnerable Versions
Schema :
"version": "None if no Vulnerable-Backport Exists else report Commit_ID of Vulnerable-Backport"
3. Fixed Versions
Schema :
"version": "Commit_ID that implements the fix"
4. Title of CVE
DataType: str
5. Summary of CVE
DataType: str
The function returns the following data
title: str #Title of CVE
date: str # Date Reported
summary: str #Description of Advisory
cve_desc: List[str] # CVE_ID determined
vuln_versions: Dict[str,str] #List of Vulnerable Versions
fixed_versions: Dict[str,str] #List of Fixed Versions
"""
with open(file) as f:
lines = f.readlines()
title = lines[0]
date: str
cve_desc: str
vuln_versions = dict()
fixed_versions = dict()
summary=""
for i in range(1, len(lines)):
if lines[i]=='\n':
continue
if "CVE-Id:" in lines[i]:
cve_desc = lines[i].split(":")[1].strip()
elif "Public-Date" in lines[i]:
date = lines[i].split(":")[1].strip()
elif "Vulnerable-Commit" in lines[i]:
vulnerable_version = lines[i].split(":")[1].strip().split("(")[1].split(")")[0].strip()
commit = lines[i].split(":")[1].strip().split("(")[0].strip()
vuln_versions[vulnerable_version] = commit
elif "Vulnerable-Backport" in lines[i]:
vulnerable_version = lines[i].split(":")[1].strip().split("(")[1].split(")")[0].strip()
commit = lines[i].split(":")[1].strip().split("(")[0].strip()
vuln_versions[vulnerable_version] = commit
elif "Fix-Commit" in lines[i] or "Fix-Backport" in lines[i]:
fixed_version = lines[i].split(":")[1].strip().split("(")[1].split(")")[0].strip()
commit = lines[i].split(":")[1].strip().split("(")[0].strip()
if fixed_version not in vuln_versions:
vuln_versions[fixed_version]=None
fixed_versions[fixed_version] = commit
else:
summary+=lines[i].strip()+" "
return title, date, summary, cve_desc, vuln_versions, fixed_versions
if __name__ == '__main__' :
for field in parse_advisory("GLIBC-SA-2023-0005"):
print(field)
Please suggest any edge cases I'm missing out on.
Also @pombredanne please help me with purl for the glibc
official sources as asked above.
Also, please let me know if a version can be specified as pre-2.39
in purl.
As, this advisory https://sourceware.org/git/?p=glibc.git;a=blob;f=advisories/GLIBC-SA-2023-0005;h=a518af803a7121038e912fae7fdcf4fe7260d362;hb=HEAD
reports a version as pre-2.39
@harsh098 re:
I wrote a sample parser to parse advisories at https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories;h=a0872e990274aee4d881508dad1bce3ea49d4d07;hb=HEAD
Can you push this in a branch with some tests? Since there is a small number of advisories for now, you should be able to parse them all in the tests. It will be simpler to post comments on the code in a PR.
As for the code, my knee jerk reaction is that you should avoid using indexes for lines as this makes things hard to read and understand.... But please write the unit tests first!
Also I wonder if the format is not something more or less like a Debian/RFC822 format or vaguely YAML'ish. If Debian like, we have a robust parser for Debian control files in https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/deb822.py and https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/
@siddhesh re:
Hi @pombredanne , I had problem figuring out the purl for glibc from glibc. I had a look at https://github.com/package-url/purl-spec I came up with pkg:generic/glibc@version?commit=commit-id may I proceed with it ?
There are multiple ways to handle this:
-
A generic PURL like you suggested but this will need a
download_url
qualifier -
A new PURL type for
glibc
... which IMHO makes the most sense since this is now a CNA and is a rather important package ;) ... we can start using it here and then submit a PR to the purl-spec to add this as a supported type. -
Another option is to instead create a
sourceware
type and have glibc as aname
under this.sourceware
is "forge-like" with https://sourceware.org/glibc/ as a homepage URL as https://www.gnu.org/software/libc/ as a secondary (or the otherway around), http://ftp.gnu.org/gnu/glibc/ as the root for the downloads, the git repo https://sourceware.org/git/?p=glibc.git for the code and advisories. There is not much structured metadata otherwise, but this a super popular package so it does not matter, this is well known enough. -
yet another option is to have a
gnu
type and alibc
as a name.
I think I would like to go with 3. sourceware
but since this would be the way glibc may start to be referenced in the many many tools and specs that use PURL, it would be great to have @siddhesh and some glibc community input. pkg:gnu/libc@<someversion>
vs. pkg:sourceware/glibc@<someversion>
and we can use glibc or libc as a name???
@harsh098 re:
I wrote a sample parser to parse advisories at https://sourceware.org/git/?p=glibc.git;a=tree;f=advisories;h=a0872e990274aee4d881508dad1bce3ea49d4d07;hb=HEAD
Can you push this in a branch with some tests? Since there is a small number of advisories for now, you should be able to parse them all in the tests. It will be simpler to post comments on the code in a PR.
As for the code, my knee jerk reaction is that you should avoid using indexes for lines as this makes things hard to read and understand.... But please write the unit tests first!
Also I wonder if the format is not something more or less like a Debian/RFC822 format or vaguely YAML'ish. If Debian like, we have a robust parser for Debian control files in https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/deb822.py and https://github.com/nexB/debian-inspector/blob/main/src/debian_inspector/
I haven't started work on the importer yet. My first goal is to get the parser right. So, may I write these advisories and tests in a separate repo and add you(@pombredanne ) and @siddhesh as contributor. My Goal is to first get the parser to cover all edge cases.
The format is not vary YAML ish and is much more like Debian/RFC822. But the grammar for the format used by GLIBC is not very well defined unlike RFC822 and the fields are entirely different. That calls for a parser of its own.
A few recommendations would be to get the versioning in the advisories standardised instead of using versioning like pre-2.39
use concrete versions or use <=
better would be to refer to version tags
Also, I wanted to ask @pombredanne how do you fetch the tags in the VCSResponse
class as this would be very valuable in determining affected versions for versions like pre-2.39
and so on ?
@harsh098 Please start a PR early, even if this is only for the parser. We cannot review code otherwise. That's the FOSS way! If anything that will help us understand any issues with the format.
And wrt. the Git tags and in general the glibc versions there are two ways to get versions:
- Parse http://ftp.gnu.org/gnu/glibc/ as HTML. We have code in PurlDB to do this in https://github.com/search?q=repo%3AnexB%2Fpurldb href&type=code way too many places... we should have a shared function instead. An example is this https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/etc/scripts/utils_thirdparty.py#L1739 or OpenSSL releases https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/minecode/visitors/openssl.py#L38
- Collect Git tags either from a clone. There is code in FetchCode for this. And also in PurlDB https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/packagedb/find_source_repo.py#L427 We should have only one way in FetchCode instead. It boils down to running https://git-scm.com/docs/git-ls-remote.html (or similar through a library call)
@keshav-space @TG1999 do you have some hints on which function/code to call for this?
Note that ideally glibc would expose a simple way to list ALL the relevant released versions. There is a long tail of legacy release tags from CVS, Fedora tags, various heads and more that we should likely exclude, going back 30 years and to the CVS days. See https://sourceware.org/git/?p=glibc.git;a=tags
@siddhesh The meaning of these two tags is obvious:
Vulnerable-Commit: 973fe93a5675c42798b2161c6f29c01b0e243994 (pre-2.39)
Fix-Commit: ec6b95c3303c700eb89eebeda2d7264cc184a796 (2.39)
What about these other two?
Vulnerable-Backport: e09ee267c03e3150c2c9ba28625ab130705a485e (2.34)
Fix-Backport: 8006457ab7e1cd556b919f477348a96fe88f2e49 (2.34)
A search either leads to the glibc advisories or -- in a funny tautological way -- to this issue ;)
@harsh098 Please start a PR early, even if this is only for the parser. We cannot review code otherwise. That's the FOSS way! If anything that will help us understand any issues with the format.
And wrt. the Git tags and in general the glibc versions there are two ways to get versions:
- Parse http://ftp.gnu.org/gnu/glibc/ as HTML. We have code in PurlDB to do this in https://github.com/search?q=repo%3AnexB%2Fpurldb href&type=code way too many places... we should have a shared function instead. An example is this https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/etc/scripts/utils_thirdparty.py#L1739 or OpenSSL releases https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/minecode/visitors/openssl.py#L38
- Collect Git tags either from a clone. There is code in FetchCode for this. And also in PurlDB https://github.com/nexB/purldb/blob/e1a20006b513a0379e69a69624da4e614a4f27dd/packagedb/find_source_repo.py#L427 We should have only one way in FetchCode instead. It boils down to running https://git-scm.com/docs/git-ls-remote.html (or similar through a library call)
@keshav-space @TG1999 do you have some hints on which function/code to call for this?
Note that ideally glibc would expose a simple way to list ALL the relevant released versions. There is a long tail of legacy release tags from CVS, Fedora tags, various heads and more that we should likely exclude, going back 30 years and to the CVS days. See https://sourceware.org/git/?p=glibc.git;a=tags
@pombredanne
Turns out I need the output of
git describe --tags <commit-id>
to find the first tag that contains the commit
@siddhesh re:
Turns out I need the output of git describe --tags
to find the first tag that contains the commit
Can you elaborate as to why?
@pombredanne The ftp site only contains the compiled artifacts after the vulnerability fix this means there is no way to determine if the package was ever affected from the ftp mirror. For example
Vulnerable-Backport: e09ee267c03e3150c2c9ba28625ab130705a485e (2.34)
Fix-Backport: 8006457ab7e1cd556b919f477348a96fe88f2e49 (2.34)
$ git describe --tags e09ee267c03e3150c2c9ba28625ab130705a485e
glibc-2.34-420-ge09ee267c0
$ git describe --tags 8006457ab7e1cd556b919f477348a96fe88f2e49
glibc-2.34-421-g8006457ab7
produces the exact version strings where a vulnerability was encountered and when it was fixed this is particularly useful in separating the vulnerable and non-vulnerable versions of the same version string 2.34
As on the ftp mirror there is only the fixed version. But what about users who downloaded the version prior to the fix ?
Source :- https://ftp.gnu.org/gnu/glibc/
A much important thing to consider is the vulnerability was reported first in 2.38
. This makes a really high chance that an end user still uses a vulnerable version provided the user has not installed the required patched updates!
But the grammar for the format used by GLIBC is not very well defined unlike RFC822 and the fields are entirely different. That calls for a parser of its own.
@harsh098 Please reuse the debian-inspector code at the minimum, as I would rather avoid creating a custom parser entirely if the format will change, which is likely. One simple thing could be to turn this format in an RFC822/Deb822 format upstream to make this super simple to handle, rather than fixing it here.
When facing a wall, you can try to climb the wall, but this is not the only option! .... just can walking around the wall or removing the wall works too ;) let's try to remove the wall ;)
Please try and report here what does not work with debian-inspector and what would be the likely small adjustments needed.
@harsh098 You wrote:
So, may I write these advisories and tests in a separate repo
Create a pull request and add these as test files in this PR. This is just a few at this stage.
@harsh098 re: https://github.com/nexB/vulnerablecode/issues/1362#issuecomment-1886836417
There are multiple concerns to address:
- Given a release tarball in https://ftp.gnu.org/gnu/glibc/ ... which vulnerability affects a version or version range
- Given a commit or tag in Git repo (which will be different PURLs) ... which vulnerability affects this commit or commit range or tag
- Given a vulnerable version or commit or tag, which patch to apply or which or which tag version to upgrade to that has the fix.
Getting the version/tag/commit looks like is all there in the advisory. Not need to checkout and further git describe IMHO. Expanding the version ranges and commit ranges is a separate concern that may require a full clone and running some extra git queries.... and this would be an Improver IMHO, not the business of the initial advisory Importer
So let's focus on the obvious and simple thing first: let's understand and parse the advisories and extract the data just from them, and leave on the side the Git thing entirely.
Also let me restate one more time that we really need a PR discuss the code.
I'll raise a PR surely after 19th Jan 2024 I'm busy with my exams simutaneously till 5th Feb.
Please suggest any edge cases I'm missing out on.
As far as the current tags are concerned, AFAICT you're capturing all of them, thanks.
@siddhesh The meaning of these two tags is obvious:
Vulnerable-Commit: 973fe93a5675c42798b2161c6f29c01b0e243994 (pre-2.39) Fix-Commit: ec6b95c3303c700eb89eebeda2d7264cc184a796 (2.39)
What about these other two?
Vulnerable-Backport: e09ee267c03e3150c2c9ba28625ab130705a485e (2.34) Fix-Backport: 8006457ab7e1cd556b919f477348a96fe88f2e49 (2.34)
A search either leads to the glibc advisories or -- in a funny tautological way -- to this issue ;)
I knew I should have written a README ;)
- The version in the round brackets indicate the version number;
pre-
is typically for bugs that got introduced in unreleased code. You shouldn't see it too often; we had to put it in for this one advisory that we ended up putting out because we thought that the vulnerable commit may have been cherry-picked by distributions. -
Vulnerable-Backport
is to indicate a backported patch that introduced a vulnerability in a release branch. Basically we create a release/x.yy/master branch whenever we do a release and distribution maintainers often backport fixes to these branches to make it easier to stay in sync with the upstream repository. TheVulnerable-Backport
tag points to the commit(s) that introduced the vulnerability in that branch - Likewise,
Fix-Backport
refers to the commit that fixes the vulnerability in the branch mentioned at the end of that line in brackets.
produces the exact version strings where a vulnerability was encountered and when it was fixed this is particularly useful in separating the vulnerable and non-vulnerable versions of the same version string
2.34
As on the ftp mirror there is only the fixed version. But what about users who downloaded the version prior to the fix ?
There's a caveat here. Commits referenced in Fix-Commit
will be present in the tarballs in the FTP mirror. However, commits referenced in Fix-Backport
won't be. Those will be present on the release/x.yy/master
branch, since the backports happen after a release tarball has been generated and uploaded.
In general, you should be interpreting all advisory information based on the git repository and not the tarball.
A much important thing to consider is the vulnerability was reported first in
2.38
. This makes a really high chance that an end user still uses a vulnerable version provided the user has not installed the required patched updates!
Right, the updates don't go into the tarball. The tarball are for distribution consumption and after that, it is their responsibility to backport whatever they need from the release/x.yy/master
branch.
I think I would like to go with 3.
sourceware
but since this would be the way glibc may start to be referenced in the many many tools and specs that use PURL, it would be great to have @siddhesh and some glibc community input.pkg:gnu/libc@<someversion>
vs.pkg:sourceware/glibc@<someversion>
and we can use glibc or libc as a name???
Could it be pkg:gnu/glibc@<someversion>
? The gnu namespace seems like the right thing here since it's a GNU project and the package is known everywhere (except on savannah, the forge where the libc web content lies, probably a historical artifact) as glibc. Does it have to correspond to a landing page?
Also FYI, the advisories
directory will not be present in any release tarballs or non-main branches, to make sure there's only one source of truth (i.e. the advisories directory on the main branch of the git repository) for advisories.
As an upstream glibc steward I support @siddhesh's suggestion to use pkg:gnu/glibc
for the url.
@siddhesh re:
I knew I should have written a README ;) You just did! Thanks!
And you later wrote:
Could it be pkg:gnu/glibc@
? The gnu namespace seems like the right thing here since it's a GNU project and the package is known everywhere (except on savannah, the forge where the libc web content lies, probably a historical artifact) as glibc. Does it have to correspond to a landing page?
This is perfect. It does not have to correspond to a landing page. It has to be obvious, self-evident which will be the case with this one!
@codonell you then seconded @siddhesh :
As an upstream glibc steward I support @siddhesh's suggestion to use pkg:gnu/glibc for the url.
Here is what I suggest:
- We will start playing with this PURL here
- I will later submit a PR upstream at the purl-spec to add this there (I maintain Package URL). I will ping you there when this happen.
- You can decide to start using it in your advisories if you like, or not. Everyone will be able to do this too.
Guys thank you ++ for chiming in! You rock, and are true code rockstars! :bow: Thank you for your work on glibc. :heart:
@harsh098 here is a simpler and more expressive snippet to parse the advisories in their current format:
glibc_advisory = """tunables: local privilege escalation through buffer overflow
If a tunable of the form NAME=NAME=VAL is passed in the environment of a
setuid program and NAME is valid, it may result in a buffer overflow,
which could be exploited to achieve escalated privileges. This flaw was
introduced in glibc 2.34.
CVE-Id: CVE-2023-4911
Public-Date: 2023-10-03
Vulnerable-Commit: 2ed18c5b534d9e92fc006202a5af0df6b72e7aca (2.34)
Fix-Commit: 1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa (2.39)
Fix-Backport: dcc367f148bc92e7f3778a125f7a416b093964d9 (2.34)
Fix-Backport: c84018a05aec80f5ee6f682db0da1130b0196aef (2.35)
Fix-Backport: 22955ad85186ee05834e47e665056148ca07699c (2.36)
Fix-Backport: b4e23c75aea756b4bddc4abcf27a1c6dca8b6bd3 (2.37)
Fix-Backport: 750a45a783906a19591fb8ff6b7841470f1f5701 (2.38)
"""
in_description = True
description = ""
parsed = []
for line in glibc_advisory.splitlines(True):
if line.startswith("CVE-Id:"):
in_description = False
if in_description:
description += line
else:
line = line.strip()
name, _, value = line.partition(": ")
value = value.strip()
if name.endswith(("Commit", "Backport",)):
commit, _ , version = value.partition(" ")
version = version.strip(")(")
parsed.append({"name": name, "commit": commit, "version": version})
else:
parsed.append({"name": name, "value": value})
parsed.insert(0, {"name": "description", "value": description.strip()})
import json
print(json.dumps(parsed, indent=2))
yields:
[
{
"name": "description",
"value": "tunables: local privilege escalation through buffer overflow\n\nIf a tunable of the form NAME=NAME=VAL is passed in the environment of a\nsetuid program and NAME is valid, it may result in a buffer overflow,\nwhich could be exploited to achieve escalated privileges. This flaw was\nintroduced in glibc 2.34."
},
{
"name": "CVE-Id",
"value": "CVE-2023-4911"
},
{
"name": "Public-Date",
"value": "2023-10-03"
},
{
"name": "Vulnerable-Commit",
"commit": "2ed18c5b534d9e92fc006202a5af0df6b72e7aca",
"version": "2.34"
},
{
"name": "Fix-Commit",
"commit": "1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa",
"version": "2.39"
},
{
"name": "Fix-Backport",
"commit": "dcc367f148bc92e7f3778a125f7a416b093964d9",
"version": "2.34"
},
{
"name": "Fix-Backport",
"commit": "c84018a05aec80f5ee6f682db0da1130b0196aef",
"version": "2.35"
},
{
"name": "Fix-Backport",
"commit": "22955ad85186ee05834e47e665056148ca07699c",
"version": "2.36"
},
{
"name": "Fix-Backport",
"commit": "b4e23c75aea756b4bddc4abcf27a1c6dca8b6bd3",
"version": "2.37"
},
{
"name": "Fix-Backport",
"commit": "750a45a783906a19591fb8ff6b7841470f1f5701",
"version": "2.38"
}
]
The output could be less verbose though.
Or even simpler:
summary, _, tail = glibc_advisory.partition("\n\n")
description, _, metadata = tail.partition("\n\n")
parsed = [
{"name": "summary", "value": summary},
{"name": "description", "value": description}
]
for line in metadata.splitlines():
name, _, value = line.partition(": ")
if name.endswith(("Commit", "Backport",)):
commit, _ , version = value.partition(" ")
parsed.append({"name": name, "commit": commit, "version": version.strip(")(")})
else:
parsed.append({"name": name, "value": value})
Still simpler as a clean and clear one liner with a small modification of the format to make it a valid email header/message:
glibc_advisory = """Subject: tunables: local privilege escalation through buffer overflow
Description: If a tunable of the form NAME=NAME=VAL is passed in the environment of a
setuid program and NAME is valid, it may result in a buffer overflow,
which could be exploited to achieve escalated privileges. This flaw was
introduced in glibc 2.34.
CVE-Id: CVE-2023-4911
Public-Date: 2023-10-03
Vulnerable-Commit: 2ed18c5b534d9e92fc006202a5af0df6b72e7aca (2.34)
Fix-Commit: 1056e5b4c3f2d90ed2b4a55f96add28da2f4c8fa (2.39)
Fix-Backport: dcc367f148bc92e7f3778a125f7a416b093964d9 (2.34)
Fix-Backport: c84018a05aec80f5ee6f682db0da1130b0196aef (2.35)
Fix-Backport: 22955ad85186ee05834e47e665056148ca07699c (2.36)
Fix-Backport: b4e23c75aea756b4bddc4abcf27a1c6dca8b6bd3 (2.37)
Fix-Backport: 750a45a783906a19591fb8ff6b7841470f1f5701 (2.38)
"""
import email
import json
print(json.dumps(list(email.message_from_string(glibc_advisory).items()), indent=2))