cve-bin-tool feat: Additional data included in JSON output file

Description

The JSON file just contains the metadata associated with each CVE. There is nothing related to the date of the scan or the database which has been used (i.e. when it was last updated). Propose to update the JSON file format to contain this additional information

Why?

This information is already included in the console and pdf output

Environment context (optional)

Support a daily scan
Offline and Online
JSON format so that the output can be used as input into other tools
JSON files are used as easy to compare
No triage process yet - looking to use VEX in future
Linux is preferred environment

Aug 16 '23 16:08 anthonyharrison

@anthonyharrison Could you please guide me on what is to be done for this?

Sep 03 '23 12:09 metabiswadeep

@metabiswadeep My thoughts are to create a JSON document with two main sections

metadata which includes the following data

The name and version of the tool generating the file (cve-bin-tool and VERSION)
The date the file was generated
database info consisting of two subsections
- The date that the database was last updated
- The number of records in the database broken down into each data source

vulnerabilities which contains a number of subsections

summary - which identifies the counts for each severity level (Critical, High etc)
reports - which which contains an array of the reported vulnerabilities which is as currently reported. This should be subdivided into each of the various reports e.g. NewFound CVEs etc
no_reports - which identifies the products with no identified vulnerabilities

All of the information is currently reported in the console output.

Let me know if there is any other data you feel should also be included

Sep 03 '23 17:09 anthonyharrison

I think we talked about this but I'll chime in here for the record:

I think this is a good idea
I'd like to see us start publishing a json schema for our output files (I don't think we do?)
We probably need version numbers in the schema

Sep 04 '23 04:09 terriko

I suggest we create a new format option json2 so that users of the current JSON file format are not affected by the changes.

On Mon, 4 Sept 2023, 05:40 Terri Oda, @.***> wrote:

I think we talked about this but I'll chime in here for the record:

I think this is a good idea

I'd like to see us start publishing a json schema for our output files (I don't think we do?)

We probably need version numbers in the schema

— Reply to this email directly, view it on GitHub https://github.com/intel/cve-bin-tool/issues/3259#issuecomment-1704599186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAID22STS3CKUTQW4D4QA3XYVLSJANCNFSM6AAAAAA3SZPY6M . You are receiving this because you were mentioned.Message ID: @.***>

Sep 04 '23 05:09 anthonyharrison

I think its a great idea to implement a json2 schema with addtional meta data, i think we can use the below given schema for reference for now, not sure it is good but can give a head start to improve and finalize the actual schema. @terriko @anthonyharrison what are your thoughts?

{
    "metadata": {
      "tool": {
        "name": "cve-bin-tool",
        "version": "3.3"
      },
      "generation_date": "2024-01-11"
    },
    "database_info": {
      "last_update_date": "2024-01-10",
      "record_counts": {
        "NVD": 25000,
        "OSV": 1200,
        "GAD": 500,
        "REDHAT": 500,
        "Curl": 500
      }
    },
    "vulnerabilities": {
      "summary": {
        "critical": 5,
        "high": 20,
        "medium": 50,
        "low": 100
      },
      "report": [
        {
          "datasource": "NVD",
          "entries": [
            {
              "vendor": "pypa",
              "product": "pip",
              "version": "20.0.2",
              "cve_number": "CVE-2018-20225",
              "severity": "HIGH",
              "score": 7.8,
              "source": "NVD",
              "cvss_version": 3,
              "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
              "epss_probability": 0.112,
              "epss_percentile": 0.43692,
              "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
              "remarks": "NewFound",
              "comments": ""
            },
            {
              "vendor": "pypa",
              "product": "pip",
              "version": "20.0.2",
              "cve_number": "CVE-2021-3572",
              "severity": "MEDIUM",
              "score": 5.7,
              "source": "NVD",
              "cvss_version": 3,
              "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
              "epss_probability": 0.057,
              "epss_percentile": 0.21609,
              "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
              "remarks": "NewFound",
              "comments": ""
            }
          ]
        }
      ]
    }
}

Jan 11 '24 18:01 mastersans

@mastersans This is going in the right direction. A few observations .

Ensure that the date fields also include the time as well
The report should include all of the information which is produced to the console, so we need to also include the list of components which have been identified as having no vulnerabilities
It would be good to include within the metadata the parameters which have been used in the scan e.g. the list of checkers, any parameters which limit the data e.g. a CVSS score limit.

Note the desire to also define a JSON schema to support this new reporting structure.

Jan 11 '24 18:01 anthonyharrison

@anthonyharrison Got it, will work on something and keep you updated!! Thanks!!

Jan 11 '24 18:01 mastersans

Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?

severity
metrics
cvss
checkers: skip, run

Jan 16 '24 17:01 mastersans

Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?

severity

metrics

cvss

checkers: skip, run

Hey @anthonyharrison can we further discuss about the schema of new json format? I would love to work on it and get it implemented.

Feb 13 '24 06:02 mastersans

Hello @mastersans Do you have an example of the new JSON file to share? I am keen that the JSON file includes as much information as possible.

There are a number of tools which can generate an initial schema from a JSON document - this might be a useful start to create a schema (will need some modification depending on the contents of the JSON document used to generate the schema)

Feb 13 '24 14:02 anthonyharrison

@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.

{
    "metadata": {
        "tool": {
            "name": "cve-bin-tool",
            "version": "3.3"
        },
        "generation_date": "2024-01-11T08:00:00Z",
        "parameters": {
            "severity": "high",
            "metrics": {
                "epss_probability":0.05, 
                "epss_percentile":0.2
            },
            "cvss": 5,
            "checkers": {
                "skip": ["go", "expat"],
                "run": ["minicom","hwloc"]
            }
        }
    },
    "database_info": {
        "last_update_date": "2024-01-10T12:00:00Z",
        "enabled_sources": {
            "NVD": 25000,
            "OSV": 1200,
            "Curl": 500
        },
        "disabled_sources": {
            "GAD": 500,
            "REDHAT": 500
        }
    },
    "vulnerabilities": {
        "summary": {
            "critical": 5,
            "high": 20,
            "medium": 50,
            "low": 100
        },
        "report": [
            {
                "datasource": "NVD",
                "entries": [
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2018-20225",
                        "severity": "HIGH",
                        "score": 7.8,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
                        "epss_probability": 0.112,
                        "epss_percentile": 0.43692,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
                        "remarks": "NewFound",
                        "comments": ""
                    },
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2021-3572",
                        "severity": "MEDIUM",
                        "score": 5.7,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
                        "epss_probability": 0.057,
                        "epss_percentile": 0.21609,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
                        "remarks": "NewFound",
                        "comments": ""
                    }
                ]
            }
        ]
    },
    "no_vulnerabilities": [
        {
            "vendor": "polkit_project",
            "product": "polkit",
            "version": "124"
        }
    ]
}

Feb 13 '24 15:02 mastersans

@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.

{
    "metadata": {
        "tool": {
            "name": "cve-bin-tool",
            "version": "3.3"
        },
        "generation_date": "2024-01-11T08:00:00Z",
        "parameters": {
            "severity": "high",
            "metrics": {
                "epss_probability":0.05, 
                "epss_percentile":0.2
            },
            "cvss": 5,
            "checkers": {
                "skip": ["go", "expat"],
                "run": ["minicom","hwloc"]
            }
        }
    },
    "database_info": {
        "last_update_date": "2024-01-10T12:00:00Z",
        "enabled_sources": {
            "NVD": 25000,
            "OSV": 1200,
            "Curl": 500
        },
        "disabled_sources": {
            "GAD": 500,
            "REDHAT": 500
        }
    },
    "vulnerabilities": {
        "summary": {
            "critical": 5,
            "high": 20,
            "medium": 50,
            "low": 100
        },
        "report": [
            {
                "datasource": "NVD",
                "entries": [
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2018-20225",
                        "severity": "HIGH",
                        "score": 7.8,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
                        "epss_probability": 0.112,
                        "epss_percentile": 0.43692,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
                        "remarks": "NewFound",
                        "comments": ""
                    },
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2021-3572",
                        "severity": "MEDIUM",
                        "score": 5.7,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
                        "epss_probability": 0.057,
                        "epss_percentile": 0.21609,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
                        "remarks": "NewFound",
                        "comments": ""
                    }
                ]
            }
        ]
    },
    "no_vulnerabilities": [
        {
            "vendor": "polkit_project",
            "product": "polkit",
            "version": "124"
        }
    ]
}

Hey @anthonyharrison a quick reminder incase you missed this one.

Mar 02 '24 13:03 mastersans

@mastersans Looking good.

I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.

Mar 02 '24 13:03 anthonyharrison

@mastersans Looking good.

I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.

@anthonyharrison Actual i have included the disable and enabled sources in database info and, also the checkers skip and run in the meta data, regarding the input file I will include it in meta data as well, Is there anything that i can include that comes off the top in you mind I have include almost everything that can limit the results such as: severity metrics cvss checkers: skip, run database: enabled, disabled

Mar 02 '24 13:03 mastersans

I think capturing the input data would also be useful as this indicates what has been scanned.,

Also whether exploits are being checked for.

Mar 02 '24 13:03 anthonyharrison

I think capturing the input data would also be useful as this indicates what has been scanned.,

Also whether exploits are being checked for.

@anthonyharrison Great i have worked on doing something similar in the case of config generator woundn't be difficult to used a little tweak of that to get input parameters Should i make a sepearate section for the input args or include it in meta data. ?

Mar 02 '24 13:03 mastersans

@anthonyharrison I'll draft a PR and send ut to you as most of the format is matured and any small changes can be made easily.

Mar 06 '24 04:03 mastersans

Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???

{
  "options": {
    "exclude": [],
    "disable-version-check": false,
    "disable-validation-check": false,
    "offline": false,
    "detailed": false
  },
  "cve_data_download": {
    "nvd": "json-mirror",
    "update": "daily",
    "disable-data-source": []
  },
  "input": {
    "directory": "test/sbom/cyclonedx_test.json"
  },
  "output": {
    "quiet": false,
    "log-level": "info",
    "format": "json2",
    "cvss": 0,
    "severity": "low",
    "metrics": false,
    "no-0-cve-report": false,
    "affected-versions": 0,
    "sbom-type": "",
    "sbom-format": ""
  },
  "merge_report": {
    "append": false,
    "filter": []
  },
  "database_management": {
    "ignore-sig": false,
    "log-signature-error": false
  },
  "exploits": {
    "exploits": false
  },
  "deprecated": {
    "extract": true,
    "report": false
  }
}

Mar 19 '24 14:03 mastersans

Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???

{
  "options": {
    "exclude": [],
    "disable-version-check": false,
    "disable-validation-check": false,
    "offline": false,
    "detailed": false
  },
  "cve_data_download": {
    "nvd": "json-mirror",
    "update": "daily",
    "disable-data-source": []
  },
  "input": {
    "directory": "test/sbom/cyclonedx_test.json"
  },
  "output": {
    "quiet": false,
    "log-level": "info",
    "format": "json2",
    "cvss": 0,
    "severity": "low",
    "metrics": false,
    "no-0-cve-report": false,
    "affected-versions": 0,
    "sbom-type": "",
    "sbom-format": ""
  },
  "merge_report": {
    "append": false,
    "filter": []
  },
  "database_management": {
    "ignore-sig": false,
    "log-signature-error": false
  },
  "exploits": {
    "exploits": false
  },
  "deprecated": {
    "extract": true,
    "report": false
  }
}

I am thinking of to only include those actually limits the output in some way or another?

Mar 19 '24 15:03 mastersans

Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.

We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.

Mar 20 '24 09:03 anthonyharrison

Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.

We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.

@anthonyharrison actually if any new parameters is added to the tool it would be automatically included in the parameter field with the code i used similar to config generator, also i have shared my proposal on gitter .

Mar 20 '24 09:03 mastersans