safety
safety copied to clipboard
Better JSON structure for easy and safer parsing
- safety version: 1.8.5
- Python version: 3.6.7
- Operating System: Ubuntu 18.04.1 LTS
Description
Run the following command from a terminal:
echo "Jinja==1.0.0" | safety check --stdin --full-report --json
The result that will be returned looks like this:
[
[
"jinja",
"<2.7.2",
"1.0.0",
"jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
"25863"
],
[
"jinja",
"<2.7.3",
"1.0.0",
"jinja 2.7.3 fixes a security issue: Corrected the security fix for the cache folder.",
"25864"
]
]
As you can see it returns a list of all CVE's that where found along with information about the CVE. There is a big downside about the current structure though (or atleast I think so). As this is a list with values inside it means I have to do extra checks or dangerous operations to get values out.
Now imagine that the script is run from Python within a os.popen
or equivalent way like this:
command = (
"echo \"Jinja==1.0.0\"" | safety check --stdin --full-report --json"
)
cve_result_details = json.loads(os.popen(command).read())
I'll have a JSON
dict just like it was sent. Now how can I safely get out the upper version of the CVE? I'd have to do something like:
if cve_result_details:
upper_version = cve_result_details[0][1]
This feels pretty dangerous & risky. I'd propose another JSON structure that looks like this:
{
"cve_reports": [
{
"package_name": "jinja",
"upper_version": "<2.7.2",
"installed_version": "1.0.0",
"package_description": "jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
"25863"
}
]
}
This would allow to do cleaner & safer operations. To get out the upper version I could now do:
upper_version = cve_result.get('cve_reports').get('upper_version')
The benefits:
- Will not crash if no result
- Cleaner to write and easier to understand
@Jwomers thanks for creating/working on this project! Could I ask you for feedback about this one maybe?
There's a typo. You'd still need to write
upper_version = cve_result.get('cve_reports')[0].get('upper_version')
I'd say the first dictionary is not necessary:
[
{
"package_name": "jinja",
"upper_version": "<2.7.2",
"installed_version": "1.0.0",
"package_description": "jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
"var": "25863"
}
]
And
for cve in cve_result:
upper_version = cve.get("upper_version")
# etc.
But then you'd check if upper_version
is None
right? And if safety always return length-5 arrays, it's pretty safe to do:
for cve in cve_result:
package_name, upper_version, installed_version, package_description, var = cve
# do something with these variables
You can even wrap it in a namedtuple
:
from collections import namedtuple
CVE = namedtuple("CVE", "package_name upper_version installed_version package_description var")
for cve_data in cve_result:
cve = CVE(cve_data)
print(cve.upper_version)
print(cve.package_name)
# etc.
One benefit of using dictionaries instead of lists/arrays though would be that user code would not rely on the order of elements, but rather get them by name, making the code more robust to potential changes in safety.
@pawamoy yep, agreed with you! I think that the benefit of being more robust for changes is a great benefit for safety. It could even be a new parameter --json-detailed
so that old users that use the API still get back what they except while exposing a new method which is more detailed.
@Yenthe666 I agree we might need a better JSON reporter, and the --json-detailed
is a good way to add this without breaking other users.
There is just one thing I am not following yet: upper_version. What you call as upper version is in fact a spec version, indicating affected version for a reported vulnerability. What happens is that some specs can include a range of versions, and there is a rare chance they might intersect each other.
Anyway, I am not saying that this is not still a good idea. I am just saying that Safety is not making a decision about which version you should upgrade to. And I don't think that's the motivation here.
Also upvoting this issue as I tried today to convert the JSON output to a JUnit format (with Cucumber-JSON-to-JUnit) because the CI in my company needs it. However Cucumber failed to parse the JSON stating it was invalid (though valid per RFCs), so we would also need a more standard JSON output
As a workaround, this jq filter will make the transformation as part of a pipeline:
jq '[.[] | { package_name: .[0], upper_version: .[1], installed_version: .[2], description: .[3], var: .[4]}]'
The enclosing square brackets can be omitted to produce the JSONL format, if desired.
@harlekeyn can you look at this too?
One benefit of using dictionaries instead of lists/arrays though would be that user code would not rely on the order of elements, but rather get them by name, making the code more robust to potential changes in safety.
My code recently broke indeed, because of added fields.
I'm closing this because Safety 2.0 improved the JSON structure with more information and added remediations for API Keys users.
Thanks to you all for the ideas to improve our JSON report.