vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Track licenses for each data pointers and records

Open pombredanne opened this issue 4 years ago • 13 comments

We need to decide what we want to do wrt. licenses for data. See https://cve.mitre.org/about/termsofuse.html for instance for the CVE/NVD. There are a few ways to think about this:

  1. we are storing only pointers so there is no licenses issues to track as we are not storing third-party data
  2. we are storing only pointers and caching existing data so we should handle this in a way similar to what search engine do.
  3. we are storing data so we should track licenses either per-record or per source

Each of these cases may have an impact of the resulting data licenses, which should be as open as possible (ideally some CC0-1.0)

pombredanne avatar Sep 26 '19 08:09 pombredanne

Another take on this topic: we are building open tools to collect, aggregate and redistribute a free and open software vulnerability database. At a high level we are keeping pointers/references and relate together many vulnerability records and software package versions they impact.

A pointer/reference is typically a URL and an ID to vulnerability information and to packages such as these below that are all related together:

  • https://bugzilla.redhat.com/show_bug.cgi?id=813428
  • https://nvd.nist.gov/vuln/detail/CVE-2012-0217
  • https://lists.xen.org/archives/html/xen-announce/2012-06/msg00001.html
  • https://www.debian.org/security/2012/dsa-2508
  • https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677297
  • https://www.openwall.com/lists/oss-security/2012/06/12/1 ...

We have a few areas where we would need some help and make some decisions soon enough:

  • we want the data we re-distribute to be as open as possible (ideally some kind of CC0),

  • the data we collect are itself under a variety of more or less open licenses, but all available publicly, and we need to decide:

  • what data we can aggregate or not based on licenses?

  • what if we keep only pointers/URLs as opposed to actual details of the records?

  • should we track the license of individual records or not? (and also based on what data we keep)

pombredanne avatar Sep 26 '19 08:09 pombredanne

I did have an extensive chat with @LeChasseur on that topic and some of the key points are:

  1. while a single vulnerability record may not be copyrightable, databases can be copyrighted
  2. on top of that, Europe has a notion of "sui generis" rights that may apply when data comes originally from Europe. It does not extend to non European data
  3. the cleanest way to promote reuse for the data we create (and for any right in the aggregate) is to use a public domain CC0-1.0 dedication. We could request but not demand some attribution. Anything else makes the data rather problematic to reuse, and we want to promote maximum reuse.
  4. we would need to track at the minimum the license if any of each data source.
  5. some data source may be out of reach for us to redistribute as aggregates based on their licenses

Some pointers about possibly problematic sources:

  • RedHat victims: https://github.com/fabric8-analytics/victims-cve-db data is CC-BY-SA (was AGPL https://github.com/victims/victims-cve-db/commit/86df6d7c14c11c41a60db8ebffdaa8409f856ead#diff-e0903c27550bef075e6af0bb73039f25 )

  • ~~Suse CVRF and OVAL data is CC-BY-NC which makes completely non open and impractical to reuse See ftp://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml~~

    • SUSE is now confirmed as CC-BY-4.0 (See below)
  • ~~RedHat RHSA CVRF and OVAL data only has a "Copyright © 2019 Red Hat, Inc. All rights reserved." https://www.redhat.com/security/data/cvrf/2019/cvrf-rhsa-2019-3300.xml https://www.redhat.com/security/data/oval/~~

    • Per https://access.redhat.com/security/data the license is now confirmed as CC-BY-4.0
  • Pyup https://github.com/pyupio/safety-db is also CC-BY-NC

  • Snyk was AGPL but this is completely stale and non open data now so we will not use it https://github.com/snyk/vulnerabilitydb/tree/43c0cce02216dac20ef22441c6a3f50c995f5d55

  • https://github.com/fabric8-analytics/cvedb does not have a clear license for now but may be entirely aggregated from other sources

pombredanne avatar Nov 05 '19 13:11 pombredanne

This is best handled a tad later to decide how we will implement license tracking possibly in #123 This is important enough to defer to the next milestone.

pombredanne avatar Nov 18 '19 09:11 pombredanne

Slightly off-topic comment, but here we go :)

I am no longer involved with https://github.com/fabric8-analytics/cvedb but it should only contain data collected from NVD (there is a dummy bot which scans NVD once a day and opens pull-requests in the database repository for further review/human curation). AFAIK, the team never decided on what license the database should use.

msrb avatar Jan 14 '20 13:01 msrb

@msrb Thank you ++ for chiming in as this is quite useful

pombredanne avatar Jan 15 '20 11:01 pombredanne

@msrb I'd love to reuse, integrate and further the code of https://github.com/fabric8-analytics/cvejob too... let me enter a ticket there to ask about the license of the code too... or would you know?

pombredanne avatar Jan 15 '20 11:01 pombredanne

@msrb Your diagram at https://github.com/fabric8-analytics/cvejob/blob/master/docs/internals.md is great! :+1:

pombredanne avatar Jan 15 '20 11:01 pombredanne

Things which we already use, without clarification of LICENSE. We need to reach/dig deeper these sources

  • [x] https://gitlab.alpinelinux.org/alpine/infra/alpine-secdb : based on https://gitlab.alpinelinux.org/alpine/infra/docker/secdb/-/blob/master/license.txt this is CC-BY-SA-4.0
  • [ ] https://security.archlinux.org/json
  • [x] http://ftp.suse.com/pub/projects/security/yaml/ : see below everything is now CC-BY-4.0
  • [ ] https://usn.ubuntu.com/usn-db/database-all.json.bz2
  • [x] https://api.github.com/graphql is CC-BY-4.0 per https://github.com/github/advisory-database/blob/main/LICENSE.md
  • [x] https://anongit.gentoo.org/git/data/glsa.git answer maybe found on https://www.gentoo.org/support/security/ which states that " The contents of this document, unless otherwise expressly stated, are licensed under the CC-BY-SA-3.0 license."

sbs2001 avatar Jul 27 '20 14:07 sbs2001

See also https://github.com/nexB/scancode-toolkit/issues/2143 for the Rubysec data

pombredanne avatar Sep 10 '20 08:09 pombredanne

See also #277

pombredanne avatar Nov 20 '20 10:11 pombredanne

SUSE CVRF and OVAL data is CC-BY-NC which makes completely non open and impractical to reuse See https://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml

SUSE has changed (some? all?) its vulnerability data license from CC-BY-NC-SA to CC-BY

  • https://ftp.suse.com/pub/projects/security/cvrf/cvrf-suse-su-2017%3A2968-1.xml : "The CVRF data is provided by SUSE under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)."

  • https://ftp.suse.com/pub/projects/security/oval/ : "SUSE OVAL data is supplied under Creative Commons license, with Attribution (CC-BY-4.0)."

  • https://www.suse.com/support/security/oval/ : "The OVAL data is provided by SUSE under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)."

Though there is still some global ambiguity based on the text of https://ftp.suse.com/pub/projects/security/cvrf-cve/LICENSE

The SUSE CVRF data is provided by SUSE under the Creative Commons license, with Attribution for Non Commercial use: CC-BY-4.0 https://creativecommons.org/licenses/by/4.0/

This text makes a reference to CC-BY but still mentions Non Commercial Use from a CC-BY-NC

And based on https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2015%3A0255-1.xml or https://ftp.suse.com/pub/projects/security/cvrf1.2/cvrf-opensuse-su-2015%3A0225-1.xml we still have some records left with this CC-BY-NC:

The CVRF data is provided by SUSE under the Creative Commons License 4.0 with Attribution for Non-Commercial usage (CC-BY-NC-4.0).

But even in the same data source we have other CC-BY licenses in https://ftp.suse.com/pub/projects/security/cvrf/cvrf-opensuse-su-2016%3A1623-1.xml

Copyright SUSE LLC under the Creative Commons License 4.0 with Attribution (CC-BY-4.0)

So this is a bit messy. I am reaching out to SUSE security by email.

pombredanne avatar May 23 '22 09:05 pombredanne

I sent this to [email protected]:

Hi: Thank you for changing most of your vulnerability data license to CC-BY somewhat recently. Yet there are still some problems with leftover CC-BY-NC. Because of this, it makes the data difficult to consume automatically as each record need to be cherry picked based on its licenses allowing or not allowing usage (CC-BY-NC essentially prohibits any usage beyond mere reading) May I suggest to use the plain CC-BY license consistently everywhere? Or update your web pages and top level license notices to be consistent to alert that there is a mix of CC-BY and CC-BY-NC?

Thank you for your kind consideration!

For extra details, please see https://github.com/nexB/vulnerablecode/issues/63#issuecomment-1134425600 for reference that I am pasting here:

pombredanne avatar May 23 '22 09:05 pombredanne

And we got a super speedy reply from SUSE security team:

Sorry for this oversight that it was not done consistently. (the ones affected were not being regenerated by the tooling.) I now did a massive replacement, and all of cvrf files should be fine. Also adjusted the LICENSE files in the directories.

Thank you ++ SUSE!

pombredanne avatar May 24 '22 08:05 pombredanne

I chatted on the side with Ubuntu folks on their IRC: on libera.chat #ubuntu-security

@stevebeattie FYI

This is about

  • #1051
  • #754

pombreda> Philippe Ombredanne Hiya :) What the license for the security data at https://ubuntu.com/security/notices (and the usn-db dump) 6:51 PM And the license for https://ubuntu.com/security/oval reports itself as GPL and I do not know what to do for data with a GPL. 6:51 PM Who to talk to? 6:51 PM FWIW, we aggregate this in our little project at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu.py and https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/ubuntu_usn.py and we like to have a license for that! 6:52 PM Debian did not have a license ... but that was clarified at https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/importers/debian_oval.py

Steve Beattie pombreda: hey, thanks for the question, and apologies that we don't have explicit terms on this stuff. In general, we want people to be able to consume, integrate, aggregare, and use the data presented in tools like nexB (so long as the data is represented accurately). 7:19 PM I'll poke people internally about getting more explicit statements in place.

pombredanne avatar Jan 16 '23 11:01 pombredanne