cve-bin-tool icon indicating copy to clipboard operation
cve-bin-tool copied to clipboard

feat: Additional data included in JSON output file

Open anthonyharrison opened this issue 2 years ago • 21 comments

Description

The JSON file just contains the metadata associated with each CVE. There is nothing related to the date of the scan or the database which has been used (i.e. when it was last updated). Propose to update the JSON file format to contain this additional information

Why?

This information is already included in the console and pdf output

Environment context (optional)

  • Support a daily scan
  • Offline and Online
  • JSON format so that the output can be used as input into other tools
  • JSON files are used as easy to compare
  • No triage process yet - looking to use VEX in future
  • Linux is preferred environment

anthonyharrison avatar Aug 16 '23 16:08 anthonyharrison

@anthonyharrison Could you please guide me on what is to be done for this?

metabiswadeep avatar Sep 03 '23 12:09 metabiswadeep

@metabiswadeep My thoughts are to create a JSON document with two main sections

metadata which includes the following data

  • The name and version of the tool generating the file (cve-bin-tool and VERSION)
  • The date the file was generated
  • database info consisting of two subsections
    • The date that the database was last updated
    • The number of records in the database broken down into each data source

vulnerabilities which contains a number of subsections

  • summary - which identifies the counts for each severity level (Critical, High etc)
  • reports - which which contains an array of the reported vulnerabilities which is as currently reported. This should be subdivided into each of the various reports e.g. NewFound CVEs etc
  • no_reports - which identifies the products with no identified vulnerabilities

All of the information is currently reported in the console output.

Let me know if there is any other data you feel should also be included

anthonyharrison avatar Sep 03 '23 17:09 anthonyharrison

I think we talked about this but I'll chime in here for the record:

  • I think this is a good idea
  • I'd like to see us start publishing a json schema for our output files (I don't think we do?)
  • We probably need version numbers in the schema

terriko avatar Sep 04 '23 04:09 terriko

I suggest we create a new format option json2 so that users of the current JSON file format are not affected by the changes.

On Mon, 4 Sept 2023, 05:40 Terri Oda, @.***> wrote:

I think we talked about this but I'll chime in here for the record:

  • I think this is a good idea
  • I'd like to see us start publishing a json schema for our output files (I don't think we do?)
  • We probably need version numbers in the schema

— Reply to this email directly, view it on GitHub https://github.com/intel/cve-bin-tool/issues/3259#issuecomment-1704599186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAID22STS3CKUTQW4D4QA3XYVLSJANCNFSM6AAAAAA3SZPY6M . You are receiving this because you were mentioned.Message ID: @.***>

anthonyharrison avatar Sep 04 '23 05:09 anthonyharrison

I think its a great idea to implement a json2 schema with addtional meta data, i think we can use the below given schema for reference for now, not sure it is good but can give a head start to improve and finalize the actual schema. @terriko @anthonyharrison what are your thoughts?

{
    "metadata": {
      "tool": {
        "name": "cve-bin-tool",
        "version": "3.3"
      },
      "generation_date": "2024-01-11"
    },
    "database_info": {
      "last_update_date": "2024-01-10",
      "record_counts": {
        "NVD": 25000,
        "OSV": 1200,
        "GAD": 500,
        "REDHAT": 500,
        "Curl": 500
      }
    },
    "vulnerabilities": {
      "summary": {
        "critical": 5,
        "high": 20,
        "medium": 50,
        "low": 100
      },
      "report": [
        {
          "datasource": "NVD",
          "entries": [
            {
              "vendor": "pypa",
              "product": "pip",
              "version": "20.0.2",
              "cve_number": "CVE-2018-20225",
              "severity": "HIGH",
              "score": 7.8,
              "source": "NVD",
              "cvss_version": 3,
              "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
              "epss_probability": 0.112,
              "epss_percentile": 0.43692,
              "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
              "remarks": "NewFound",
              "comments": ""
            },
            {
              "vendor": "pypa",
              "product": "pip",
              "version": "20.0.2",
              "cve_number": "CVE-2021-3572",
              "severity": "MEDIUM",
              "score": 5.7,
              "source": "NVD",
              "cvss_version": 3,
              "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
              "epss_probability": 0.057,
              "epss_percentile": 0.21609,
              "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
              "remarks": "NewFound",
              "comments": ""
            }
          ]
        }
      ]
    }
}

mastersans avatar Jan 11 '24 18:01 mastersans

@mastersans This is going in the right direction. A few observations .

  1. Ensure that the date fields also include the time as well
  2. The report should include all of the information which is produced to the console, so we need to also include the list of components which have been identified as having no vulnerabilities
  3. It would be good to include within the metadata the parameters which have been used in the scan e.g. the list of checkers, any parameters which limit the data e.g. a CVSS score limit.

Note the desire to also define a JSON schema to support this new reporting structure.

anthonyharrison avatar Jan 11 '24 18:01 anthonyharrison

@anthonyharrison Got it, will work on something and keep you updated!! Thanks!!

mastersans avatar Jan 11 '24 18:01 mastersans

Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?

  1. severity
  2. metrics
  3. cvss
  4. checkers: skip, run

mastersans avatar Jan 16 '24 17:01 mastersans

Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?

  1. severity
  2. metrics
  3. cvss
  4. checkers: skip, run

Hey @anthonyharrison can we further discuss about the schema of new json format? I would love to work on it and get it implemented.

mastersans avatar Feb 13 '24 06:02 mastersans

Hello @mastersans Do you have an example of the new JSON file to share? I am keen that the JSON file includes as much information as possible.

There are a number of tools which can generate an initial schema from a JSON document - this might be a useful start to create a schema (will need some modification depending on the contents of the JSON document used to generate the schema)

anthonyharrison avatar Feb 13 '24 14:02 anthonyharrison

@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.

{
    "metadata": {
        "tool": {
            "name": "cve-bin-tool",
            "version": "3.3"
        },
        "generation_date": "2024-01-11T08:00:00Z",
        "parameters": {
            "severity": "high",
            "metrics": {
                "epss_probability":0.05, 
                "epss_percentile":0.2
            },
            "cvss": 5,
            "checkers": {
                "skip": ["go", "expat"],
                "run": ["minicom","hwloc"]
            }
        }
    },
    "database_info": {
        "last_update_date": "2024-01-10T12:00:00Z",
        "enabled_sources": {
            "NVD": 25000,
            "OSV": 1200,
            "Curl": 500
        },
        "disabled_sources": {
            "GAD": 500,
            "REDHAT": 500
        }
    },
    "vulnerabilities": {
        "summary": {
            "critical": 5,
            "high": 20,
            "medium": 50,
            "low": 100
        },
        "report": [
            {
                "datasource": "NVD",
                "entries": [
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2018-20225",
                        "severity": "HIGH",
                        "score": 7.8,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
                        "epss_probability": 0.112,
                        "epss_percentile": 0.43692,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
                        "remarks": "NewFound",
                        "comments": ""
                    },
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2021-3572",
                        "severity": "MEDIUM",
                        "score": 5.7,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
                        "epss_probability": 0.057,
                        "epss_percentile": 0.21609,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
                        "remarks": "NewFound",
                        "comments": ""
                    }
                ]
            }
        ]
    },
    "no_vulnerabilities": [
        {
            "vendor": "polkit_project",
            "product": "polkit",
            "version": "124"
        }
    ]
}

mastersans avatar Feb 13 '24 15:02 mastersans

@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.

{
    "metadata": {
        "tool": {
            "name": "cve-bin-tool",
            "version": "3.3"
        },
        "generation_date": "2024-01-11T08:00:00Z",
        "parameters": {
            "severity": "high",
            "metrics": {
                "epss_probability":0.05, 
                "epss_percentile":0.2
            },
            "cvss": 5,
            "checkers": {
                "skip": ["go", "expat"],
                "run": ["minicom","hwloc"]
            }
        }
    },
    "database_info": {
        "last_update_date": "2024-01-10T12:00:00Z",
        "enabled_sources": {
            "NVD": 25000,
            "OSV": 1200,
            "Curl": 500
        },
        "disabled_sources": {
            "GAD": 500,
            "REDHAT": 500
        }
    },
    "vulnerabilities": {
        "summary": {
            "critical": 5,
            "high": 20,
            "medium": 50,
            "low": 100
        },
        "report": [
            {
                "datasource": "NVD",
                "entries": [
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2018-20225",
                        "severity": "HIGH",
                        "score": 7.8,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
                        "epss_probability": 0.112,
                        "epss_percentile": 0.43692,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
                        "remarks": "NewFound",
                        "comments": ""
                    },
                    {
                        "vendor": "pypa",
                        "product": "pip",
                        "version": "20.0.2",
                        "cve_number": "CVE-2021-3572",
                        "severity": "MEDIUM",
                        "score": 5.7,
                        "source": "NVD",
                        "cvss_version": 3,
                        "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
                        "epss_probability": 0.057,
                        "epss_percentile": 0.21609,
                        "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
                        "remarks": "NewFound",
                        "comments": ""
                    }
                ]
            }
        ]
    },
    "no_vulnerabilities": [
        {
            "vendor": "polkit_project",
            "product": "polkit",
            "version": "124"
        }
    ]
}

Hey @anthonyharrison a quick reminder incase you missed this one.

mastersans avatar Mar 02 '24 13:03 mastersans

@mastersans Looking good.

I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.

anthonyharrison avatar Mar 02 '24 13:03 anthonyharrison

@mastersans Looking good.

I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.

@anthonyharrison Actual i have included the disable and enabled sources in database info and, also the checkers skip and run in the meta data, regarding the input file I will include it in meta data as well, Is there anything that i can include that comes off the top in you mind I have include almost everything that can limit the results such as: severity metrics cvss checkers: skip, run database: enabled, disabled

mastersans avatar Mar 02 '24 13:03 mastersans

I think capturing the input data would also be useful as this indicates what has been scanned.,

Also whether exploits are being checked for.

anthonyharrison avatar Mar 02 '24 13:03 anthonyharrison

I think capturing the input data would also be useful as this indicates what has been scanned.,

Also whether exploits are being checked for.

@anthonyharrison Great i have worked on doing something similar in the case of config generator woundn't be difficult to used a little tweak of that to get input parameters Should i make a sepearate section for the input args or include it in meta data. ?

mastersans avatar Mar 02 '24 13:03 mastersans

@anthonyharrison I'll draft a PR and send ut to you as most of the format is matured and any small changes can be made easily.

mastersans avatar Mar 06 '24 04:03 mastersans

Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???

{
  "options": {
    "exclude": [],
    "disable-version-check": false,
    "disable-validation-check": false,
    "offline": false,
    "detailed": false
  },
  "cve_data_download": {
    "nvd": "json-mirror",
    "update": "daily",
    "disable-data-source": []
  },
  "input": {
    "directory": "test/sbom/cyclonedx_test.json"
  },
  "output": {
    "quiet": false,
    "log-level": "info",
    "format": "json2",
    "cvss": 0,
    "severity": "low",
    "metrics": false,
    "no-0-cve-report": false,
    "affected-versions": 0,
    "sbom-type": "",
    "sbom-format": ""
  },
  "merge_report": {
    "append": false,
    "filter": []
  },
  "database_management": {
    "ignore-sig": false,
    "log-signature-error": false
  },
  "exploits": {
    "exploits": false
  },
  "deprecated": {
    "extract": true,
    "report": false
  }
}

mastersans avatar Mar 19 '24 14:03 mastersans

Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???

{
  "options": {
    "exclude": [],
    "disable-version-check": false,
    "disable-validation-check": false,
    "offline": false,
    "detailed": false
  },
  "cve_data_download": {
    "nvd": "json-mirror",
    "update": "daily",
    "disable-data-source": []
  },
  "input": {
    "directory": "test/sbom/cyclonedx_test.json"
  },
  "output": {
    "quiet": false,
    "log-level": "info",
    "format": "json2",
    "cvss": 0,
    "severity": "low",
    "metrics": false,
    "no-0-cve-report": false,
    "affected-versions": 0,
    "sbom-type": "",
    "sbom-format": ""
  },
  "merge_report": {
    "append": false,
    "filter": []
  },
  "database_management": {
    "ignore-sig": false,
    "log-signature-error": false
  },
  "exploits": {
    "exploits": false
  },
  "deprecated": {
    "extract": true,
    "report": false
  }
}

I am thinking of to only include those actually limits the output in some way or another?

mastersans avatar Mar 19 '24 15:03 mastersans

Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.

We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.

anthonyharrison avatar Mar 20 '24 09:03 anthonyharrison

Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.

We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.

@anthonyharrison actually if any new parameters is added to the tool it would be automatically included in the parameter field with the code i used similar to config generator, also i have shared my proposal on gitter .

mastersans avatar Mar 20 '24 09:03 mastersans