vulnerablecode
vulnerablecode copied to clipboard
Track licenses for each data pointers and records
We need to decide what we want to do wrt. licenses for data. See https://cve.mitre.org/about/termsofuse.html for instance for the CVE/NVD. There are a few ways to think about this:
- we are storing only pointers so there is no licenses issues to track as we are not storing third-party data
- we are storing only pointers and caching existing data so we should handle this in a way similar to what search engine do.
- we are storing data so we should track licenses either per-record or per source
Each of these cases may have an impact of the resulting data licenses, which should be as open as possible (ideally some CC0-1.0)
Another take on this topic: we are building open tools to collect, aggregate and redistribute a free and open software vulnerability database. At a high level we are keeping pointers/references and relate together many vulnerability records and software package versions they impact.
A pointer/reference is typically a URL and an ID to vulnerability information and to packages such as these below that are all related together:
- https://bugzilla.redhat.com/show_bug.cgi?id=813428
- https://nvd.nist.gov/vuln/detail/CVE-2012-0217
- https://lists.xen.org/archives/html/xen-announce/2012-06/msg00001.html
- https://www.debian.org/security/2012/dsa-2508
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677297
- https://www.openwall.com/lists/oss-security/2012/06/12/1 ...
We have a few areas where we would need some help and make some decisions soon enough:
-
we want the data we re-distribute to be as open as possible (ideally some kind of CC0),
-
the data we collect are itself under a variety of more or less open licenses, but all available publicly, and we need to decide:
-
what data we can aggregate or not based on licenses?
-
what if we keep only pointers/URLs as opposed to actual details of the records?
-
should we track the license of individual records or not? (and also based on what data we keep)
I did have an extensive chat with @LeChasseur on that topic and some of the key points are:
- while a single vulnerability record may not be copyrightable, databases can be copyrighted
- on top of that, Europe has a notion of "sui generis" rights that may apply when data comes originally from Europe. It does not extend to non European data
- the cleanest way to promote reuse for the data we create (and for any right in the aggregate) is to use a public domain CC0-1.0 dedication. We could request but not demand some attribution. Anything else makes the data rather problematic to reuse, and we want to promote maximum reuse.
- we would need to track at the minimum the license if any of each data source.
- some data source may be out of reach for us to redistribute as aggregates based on their licenses
Some pointers about possibly problematic sources:
-
RedHat victims: https://github.com/fabric8-analytics/victims-cve-db data is CC-BY-SA (was AGPL https://github.com/victims/victims-cve-db/commit/86df6d7c14c11c41a60db8ebffdaa8409f856ead#diff-e0903c27550bef075e6af0bb73039f25 )
-
~~Suse CVRF and OVAL data is CC-BY-NC which makes completely non open and impractical to reuse See ftp://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml~~
- SUSE is now confirmed as CC-BY-4.0 (See below)
-
~~RedHat RHSA CVRF and OVAL data only has a "Copyright © 2019 Red Hat, Inc. All rights reserved." https://www.redhat.com/security/data/cvrf/2019/cvrf-rhsa-2019-3300.xml https://www.redhat.com/security/data/oval/~~
- Per https://access.redhat.com/security/data the license is now confirmed as CC-BY-4.0
-
Pyup https://github.com/pyupio/safety-db is also CC-BY-NC
-
Snyk was AGPL but this is completely stale and non open data now so we will not use it https://github.com/snyk/vulnerabilitydb/tree/43c0cce02216dac20ef22441c6a3f50c995f5d55
-
https://github.com/fabric8-analytics/cvedb does not have a clear license for now but may be entirely aggregated from other sources
This is best handled a tad later to decide how we will implement license tracking possibly in #123 This is important enough to defer to the next milestone.
Slightly off-topic comment, but here we go :)
I am no longer involved with https://github.com/fabric8-analytics/cvedb but it should only contain data collected from NVD (there is a dummy bot which scans NVD once a day and opens pull-requests in the database repository for further review/human curation). AFAIK, the team never decided on what license the database should use.
@msrb Thank you ++ for chiming in as this is quite useful
@msrb I'd love to reuse, integrate and further the code of https://github.com/fabric8-analytics/cvejob too... let me enter a ticket there to ask about the license of the code too... or would you know?
@msrb Your diagram at https://github.com/fabric8-analytics/cvejob/blob/master/docs/internals.md is great! :+1:
Things which we already use, without clarification of LICENSE. We need to reach/dig deeper these sources
- [x] https://gitlab.alpinelinux.org/alpine/infra/alpine-secdb : based on https://gitlab.alpinelinux.org/alpine/infra/docker/secdb/-/blob/master/license.txt this is CC-BY-SA-4.0
- [ ] https://security.archlinux.org/json
- [x] http://ftp.suse.com/pub/projects/security/yaml/ : see below everything is now CC-BY-4.0
- [ ] https://usn.ubuntu.com/usn-db/database-all.json.bz2
- [x] https://api.github.com/graphql is CC-BY-4.0 per https://github.com/github/advisory-database/blob/main/LICENSE.md
- [x] https://anongit.gentoo.org/git/data/glsa.git answer maybe found on https://www.gentoo.org/support/security/ which states that " The contents of this document, unless otherwise expressly stated, are licensed under the CC-BY-SA-3.0 license."
See also https://github.com/nexB/scancode-toolkit/issues/2143 for the Rubysec data
See also #277
SUSE CVRF and OVAL data is CC-BY-NC which makes completely non open and impractical to reuse See https://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml
SUSE has changed (some? all?) its vulnerability data license from CC-BY-NC-SA to CC-BY
-
https://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml : "The CVRF data is provided by SUSE under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)."
-
https://ftp.suse.com/pub/projects/security/oval/ : "SUSE OVAL data is supplied under Creative Commons license, with Attribution (CC-BY-4.0)."
-
https://www.suse.com/support/security/oval/ : "The OVAL data is provided by SUSE under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)."
Though there is still some global ambiguity based on the text of https://ftp.suse.com/pub/projects/security/cvrf-cve/LICENSE
The SUSE CVRF data is provided by SUSE under the Creative Commons license, with Attribution for Non Commercial use: CC-BY-4.0 https://creativecommons.org/licenses/by/4.0/
This text makes a reference to CC-BY but still mentions Non Commercial Use from a CC-BY-NC
And based on https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2015%3A0255-1.xml or https://ftp.suse.com/pub/projects/security/cvrf1.2/cvrf-opensuse-su-2015%3A0225-1.xml we still have some records left with this CC-BY-NC:
The CVRF data is provided by SUSE under the Creative Commons License 4.0 with Attribution for Non-Commercial usage (CC-BY-NC-4.0).
But even in the same data source we have other CC-BY licenses in https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2016%3A1623-1.xml
Copyright SUSE LLC under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)
So this is a bit messy. I am reaching out to SUSE security by email.
I sent this to [email protected]:
Hi: Thank you for changing most of your vulnerability data license to CC-BY somewhat recently. Yet there are still some problems with leftover CC-BY-NC. Because of this, it makes the data difficult to consume automatically as each record need to be cherry picked based on its licenses allowing or not allowing usage (CC-BY-NC essentially prohibits any usage beyond mere reading) May I suggest to use the plain CC-BY license consistently everywhere? Or update your web pages and top level license notices to be consistent to alert that there is a mix of CC-BY and CC-BY-NC?
Thank you for your kind consideration!
For extra details, please see https://github.com/nexB/vulnerablecode/issues/63#issuecomment-1134425600 for reference that I am pasting here:
And we got a super speedy reply from SUSE security team:
Sorry for this oversight that it was not done consistently. (the ones affected were not being regenerated by the tooling.) I now did a massive replacement, and all of cvrf files should be fine. Also adjusted the LICENSE files in the directories.
Thank you ++ SUSE!
I chatted on the side with Ubuntu folks on their IRC: on libera.chat #ubuntu-security
@stevebeattie FYI
This is about
- #1051
- #754
pombreda> Philippe Ombredanne Hiya :) What the license for the security data at https://ubuntu.com/security/notices (and the usn-db dump) 6:51 PM And the license for https://ubuntu.com/security/oval reports itself as GPL and I do not know what to do for data with a GPL. 6:51 PM Who to talk to? 6:51 PM FWIW, we aggregate this in our little project at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu.py and https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu_usn.py and we like to have a license for that! 6:52 PM Debian did not have a license ... but that was clarified at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/debian_oval.py
Steve Beattie pombreda: hey, thanks for the question, and apologies that we don't have explicit terms on this stuff. In general, we want people to be able to consume, integrate, aggregare, and use the data presented in tools like nexB (so long as the data is represented accurately). 7:19 PM I'll poke people internally about getting more explicit statements in place.