feat: Additional data included in JSON output file
Description
The JSON file just contains the metadata associated with each CVE. There is nothing related to the date of the scan or the database which has been used (i.e. when it was last updated). Propose to update the JSON file format to contain this additional information
Why?
This information is already included in the console and pdf output
Environment context (optional)
- Support a daily scan
- Offline and Online
- JSON format so that the output can be used as input into other tools
- JSON files are used as easy to compare
- No triage process yet - looking to use VEX in future
- Linux is preferred environment
@anthonyharrison Could you please guide me on what is to be done for this?
@metabiswadeep My thoughts are to create a JSON document with two main sections
metadata which includes the following data
- The name and version of the tool generating the file (cve-bin-tool and VERSION)
- The date the file was generated
- database info consisting of two subsections
- The date that the database was last updated
- The number of records in the database broken down into each data source
vulnerabilities which contains a number of subsections
- summary - which identifies the counts for each severity level (Critical, High etc)
- reports - which which contains an array of the reported vulnerabilities which is as currently reported. This should be subdivided into each of the various reports e.g. NewFound CVEs etc
- no_reports - which identifies the products with no identified vulnerabilities
All of the information is currently reported in the console output.
Let me know if there is any other data you feel should also be included
I think we talked about this but I'll chime in here for the record:
- I think this is a good idea
- I'd like to see us start publishing a json schema for our output files (I don't think we do?)
- We probably need version numbers in the schema
I suggest we create a new format option json2 so that users of the current JSON file format are not affected by the changes.
On Mon, 4 Sept 2023, 05:40 Terri Oda, @.***> wrote:
I think we talked about this but I'll chime in here for the record:
- I think this is a good idea
- I'd like to see us start publishing a json schema for our output files (I don't think we do?)
- We probably need version numbers in the schema
— Reply to this email directly, view it on GitHub https://github.com/intel/cve-bin-tool/issues/3259#issuecomment-1704599186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAID22STS3CKUTQW4D4QA3XYVLSJANCNFSM6AAAAAA3SZPY6M . You are receiving this because you were mentioned.Message ID: @.***>
I think its a great idea to implement a json2 schema with addtional meta data, i think we can use the below given schema for reference for now, not sure it is good but can give a head start to improve and finalize the actual schema. @terriko @anthonyharrison what are your thoughts?
{
"metadata": {
"tool": {
"name": "cve-bin-tool",
"version": "3.3"
},
"generation_date": "2024-01-11"
},
"database_info": {
"last_update_date": "2024-01-10",
"record_counts": {
"NVD": 25000,
"OSV": 1200,
"GAD": 500,
"REDHAT": 500,
"Curl": 500
}
},
"vulnerabilities": {
"summary": {
"critical": 5,
"high": 20,
"medium": 50,
"low": 100
},
"report": [
{
"datasource": "NVD",
"entries": [
{
"vendor": "pypa",
"product": "pip",
"version": "20.0.2",
"cve_number": "CVE-2018-20225",
"severity": "HIGH",
"score": 7.8,
"source": "NVD",
"cvss_version": 3,
"cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
"epss_probability": 0.112,
"epss_percentile": 0.43692,
"paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
"remarks": "NewFound",
"comments": ""
},
{
"vendor": "pypa",
"product": "pip",
"version": "20.0.2",
"cve_number": "CVE-2021-3572",
"severity": "MEDIUM",
"score": 5.7,
"source": "NVD",
"cvss_version": 3,
"cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
"epss_probability": 0.057,
"epss_percentile": 0.21609,
"paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
"remarks": "NewFound",
"comments": ""
}
]
}
]
}
}
@mastersans This is going in the right direction. A few observations .
- Ensure that the date fields also include the time as well
- The report should include all of the information which is produced to the console, so we need to also include the list of components which have been identified as having no vulnerabilities
- It would be good to include within the metadata the parameters which have been used in the scan e.g. the list of checkers, any parameters which limit the data e.g. a CVSS score limit.
Note the desire to also define a JSON schema to support this new reporting structure.
@anthonyharrison Got it, will work on something and keep you updated!! Thanks!!
Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?
- severity
- metrics
- cvss
- checkers: skip, run
Hey @anthonyharrison I mostly was able to come up with these below mentioned arguments which can limit the output, did i missed something?
- severity
- metrics
- cvss
- checkers: skip, run
Hey @anthonyharrison can we further discuss about the schema of new json format? I would love to work on it and get it implemented.
Hello @mastersans Do you have an example of the new JSON file to share? I am keen that the JSON file includes as much information as possible.
There are a number of tools which can generate an initial schema from a JSON document - this might be a useful start to create a schema (will need some modification depending on the contents of the JSON document used to generate the schema)
@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.
{
"metadata": {
"tool": {
"name": "cve-bin-tool",
"version": "3.3"
},
"generation_date": "2024-01-11T08:00:00Z",
"parameters": {
"severity": "high",
"metrics": {
"epss_probability":0.05,
"epss_percentile":0.2
},
"cvss": 5,
"checkers": {
"skip": ["go", "expat"],
"run": ["minicom","hwloc"]
}
}
},
"database_info": {
"last_update_date": "2024-01-10T12:00:00Z",
"enabled_sources": {
"NVD": 25000,
"OSV": 1200,
"Curl": 500
},
"disabled_sources": {
"GAD": 500,
"REDHAT": 500
}
},
"vulnerabilities": {
"summary": {
"critical": 5,
"high": 20,
"medium": 50,
"low": 100
},
"report": [
{
"datasource": "NVD",
"entries": [
{
"vendor": "pypa",
"product": "pip",
"version": "20.0.2",
"cve_number": "CVE-2018-20225",
"severity": "HIGH",
"score": 7.8,
"source": "NVD",
"cvss_version": 3,
"cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H",
"epss_probability": 0.112,
"epss_percentile": 0.43692,
"paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p",
"remarks": "NewFound",
"comments": ""
},
{
"vendor": "pypa",
"product": "pip",
"version": "20.0.2",
"cve_number": "CVE-2021-3572",
"severity": "MEDIUM",
"score": 5.7,
"source": "NVD",
"cvss_version": 3,
"cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N",
"epss_probability": 0.057,
"epss_percentile": 0.21609,
"paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py",
"remarks": "NewFound",
"comments": ""
}
]
}
]
},
"no_vulnerabilities": [
{
"vendor": "polkit_project",
"product": "polkit",
"version": "124"
}
]
}
@anthonyharrison This is the sample i have, i included everything that can limit the output its mostly in parameter section, let me know if i have missed any , also included all the previous suggestion from your side in it including datetime, no-vulnerablities.
{ "metadata": { "tool": { "name": "cve-bin-tool", "version": "3.3" }, "generation_date": "2024-01-11T08:00:00Z", "parameters": { "severity": "high", "metrics": { "epss_probability":0.05, "epss_percentile":0.2 }, "cvss": 5, "checkers": { "skip": ["go", "expat"], "run": ["minicom","hwloc"] } } }, "database_info": { "last_update_date": "2024-01-10T12:00:00Z", "enabled_sources": { "NVD": 25000, "OSV": 1200, "Curl": 500 }, "disabled_sources": { "GAD": 500, "REDHAT": 500 } }, "vulnerabilities": { "summary": { "critical": 5, "high": 20, "medium": 50, "low": 100 }, "report": [ { "datasource": "NVD", "entries": [ { "vendor": "pypa", "product": "pip", "version": "20.0.2", "cve_number": "CVE-2018-20225", "severity": "HIGH", "score": 7.8, "source": "NVD", "cvss_version": 3, "cvss_vector": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H", "epss_probability": 0.112, "epss_percentile": 0.43692, "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-p", "remarks": "NewFound", "comments": "" }, { "vendor": "pypa", "product": "pip", "version": "20.0.2", "cve_number": "CVE-2021-3572", "severity": "MEDIUM", "score": 5.7, "source": "NVD", "cvss_version": 3, "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N", "epss_probability": 0.057, "epss_percentile": 0.21609, "paths": "/home/ss/test/venv/share/python-wheels/pip-20.0.2-py", "remarks": "NewFound", "comments": "" } ] } ] }, "no_vulnerabilities": [ { "vendor": "polkit_project", "product": "polkit", "version": "124" } ] }
Hey @anthonyharrison a quick reminder incase you missed this one.
@mastersans Looking good.
I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.
@mastersans Looking good.
I think the metadata section needs to capture more of the command line parameters which can be specified. e.g. the checkers which can be disabled, what data sources are disabled, what inputs were specified (e.g. directory, sbom) etc. There are a lot of parameters but if we record them, then we know what was used when we repeat the scan.
@anthonyharrison Actual i have included the disable and enabled sources in database info and, also the checkers skip and run in the meta data, regarding the input file I will include it in meta data as well, Is there anything that i can include that comes off the top in you mind I have include almost everything that can limit the results such as: severity metrics cvss checkers: skip, run database: enabled, disabled
I think capturing the input data would also be useful as this indicates what has been scanned.,
Also whether exploits are being checked for.
I think capturing the input data would also be useful as this indicates what has been scanned.,
Also whether exploits are being checked for.
@anthonyharrison Great i have worked on doing something similar in the case of config generator woundn't be difficult to used a little tweak of that to get input parameters Should i make a sepearate section for the input args or include it in meta data. ?
@anthonyharrison I'll draft a PR and send ut to you as most of the format is matured and any small changes can be made easily.
Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???
{
"options": {
"exclude": [],
"disable-version-check": false,
"disable-validation-check": false,
"offline": false,
"detailed": false
},
"cve_data_download": {
"nvd": "json-mirror",
"update": "daily",
"disable-data-source": []
},
"input": {
"directory": "test/sbom/cyclonedx_test.json"
},
"output": {
"quiet": false,
"log-level": "info",
"format": "json2",
"cvss": 0,
"severity": "low",
"metrics": false,
"no-0-cve-report": false,
"affected-versions": 0,
"sbom-type": "",
"sbom-format": ""
},
"merge_report": {
"append": false,
"filter": []
},
"database_management": {
"ignore-sig": false,
"log-signature-error": false
},
"exploits": {
"exploits": false
},
"deprecated": {
"extract": true,
"report": false
}
}
Hey @anthonyharrison I captured the input parameter as you previously mentioned and are represented in this manner although i did it in a way that it is also capturing the default value that are set in the args what are you thoughts???
{ "options": { "exclude": [], "disable-version-check": false, "disable-validation-check": false, "offline": false, "detailed": false }, "cve_data_download": { "nvd": "json-mirror", "update": "daily", "disable-data-source": [] }, "input": { "directory": "test/sbom/cyclonedx_test.json" }, "output": { "quiet": false, "log-level": "info", "format": "json2", "cvss": 0, "severity": "low", "metrics": false, "no-0-cve-report": false, "affected-versions": 0, "sbom-type": "", "sbom-format": "" }, "merge_report": { "append": false, "filter": [] }, "database_management": { "ignore-sig": false, "log-signature-error": false }, "exploits": { "exploits": false }, "deprecated": { "extract": true, "report": false } }
I am thinking of to only include those actually limits the output in some way or another?
Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.
We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.
Thanks @mastersans This is looking very good. I think it is simpler if we keep all the values at this stage as there is little overhead in including all the values.
We should probably add something in the developer documentation to remind developers if additional command line parameters are added, that they should be included in the output file.
@anthonyharrison actually if any new parameters is added to the tool it would be automatically included in the parameter field with the code i used similar to config generator, also i have shared my proposal on gitter .