safety icon indicating copy to clipboard operation
safety copied to clipboard

Better JSON structure for easy and safer parsing

Open Yenthe666 opened this issue 5 years ago • 8 comments

  • safety version: 1.8.5
  • Python version: 3.6.7
  • Operating System: Ubuntu 18.04.1 LTS

Description

Run the following command from a terminal: echo "Jinja==1.0.0" | safety check --stdin --full-report --json The result that will be returned looks like this:

[
    [
        "jinja",
        "<2.7.2",
        "1.0.0",
        "jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems.  See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
        "25863"
    ],
    [
        "jinja",
        "<2.7.3",
        "1.0.0",
        "jinja 2.7.3 fixes a security issue: Corrected the security fix for the cache folder.",
        "25864"
    ]
]

As you can see it returns a list of all CVE's that where found along with information about the CVE. There is a big downside about the current structure though (or atleast I think so). As this is a list with values inside it means I have to do extra checks or dangerous operations to get values out. Now imagine that the script is run from Python within a os.popen or equivalent way like this:

command = (
    "echo \"Jinja==1.0.0\"" | safety check --stdin --full-report --json"
)
cve_result_details = json.loads(os.popen(command).read())

I'll have a JSON dict just like it was sent. Now how can I safely get out the upper version of the CVE? I'd have to do something like:

if cve_result_details:
    upper_version = cve_result_details[0][1]

This feels pretty dangerous & risky. I'd propose another JSON structure that looks like this:

{
    "cve_reports": [
      {
         "package_name": "jinja",
        "upper_version": "<2.7.2",
        "installed_version": "1.0.0",
        "package_description": "jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems.  See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
        "25863"
        }
    ]
}

This would allow to do cleaner & safer operations. To get out the upper version I could now do:

upper_version = cve_result.get('cve_reports').get('upper_version')

The benefits:

  • Will not crash if no result
  • Cleaner to write and easier to understand

Yenthe666 avatar Oct 17 '19 15:10 Yenthe666

@Jwomers thanks for creating/working on this project! Could I ask you for feedback about this one maybe?

Yenthe666 avatar Oct 17 '19 15:10 Yenthe666

There's a typo. You'd still need to write

upper_version = cve_result.get('cve_reports')[0].get('upper_version')

I'd say the first dictionary is not necessary:

[
    {
        "package_name": "jinja",
        "upper_version": "<2.7.2",
        "installed_version": "1.0.0",
        "package_description": "jinja 2.7.2 fixes a security issue: Changed the default folder for the filesystem cache to be user specific and read and write protected on UNIX systems.  See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734747 for more information.",
        "var": "25863"
    }
]

And

for cve in cve_result:
    upper_version = cve.get("upper_version")
    # etc.

But then you'd check if upper_version is None right? And if safety always return length-5 arrays, it's pretty safe to do:

for cve in cve_result:
    package_name, upper_version, installed_version, package_description, var = cve
    # do something with these variables

You can even wrap it in a namedtuple:

from collections import namedtuple

CVE = namedtuple("CVE", "package_name upper_version installed_version package_description var")

for cve_data in cve_result:
    cve = CVE(cve_data)
    print(cve.upper_version)
    print(cve.package_name)
    # etc.

One benefit of using dictionaries instead of lists/arrays though would be that user code would not rely on the order of elements, but rather get them by name, making the code more robust to potential changes in safety.

pawamoy avatar Oct 17 '19 15:10 pawamoy

@pawamoy yep, agreed with you! I think that the benefit of being more robust for changes is a great benefit for safety. It could even be a new parameter --json-detailed so that old users that use the API still get back what they except while exposing a new method which is more detailed.

Yenthe666 avatar Oct 17 '19 17:10 Yenthe666

@Yenthe666 I agree we might need a better JSON reporter, and the --json-detailed is a good way to add this without breaking other users.

There is just one thing I am not following yet: upper_version. What you call as upper version is in fact a spec version, indicating affected version for a reported vulnerability. What happens is that some specs can include a range of versions, and there is a rare chance they might intersect each other.

Anyway, I am not saying that this is not still a good idea. I am just saying that Safety is not making a decision about which version you should upgrade to. And I don't think that's the motivation here.

rafaelpivato avatar Mar 22 '20 04:03 rafaelpivato

Also upvoting this issue as I tried today to convert the JSON output to a JUnit format (with Cucumber-JSON-to-JUnit) because the CI in my company needs it. However Cucumber failed to parse the JSON stating it was invalid (though valid per RFCs), so we would also need a more standard JSON output

Bokoblin avatar Oct 05 '20 18:10 Bokoblin

As a workaround, this jq filter will make the transformation as part of a pipeline:

jq '[.[] | { package_name: .[0], upper_version: .[1], installed_version: .[2], description: .[3], var: .[4]}]'

The enclosing square brackets can be omitted to produce the JSONL format, if desired.

adevore avatar Mar 08 '21 19:03 adevore

@harlekeyn can you look at this too?

rafaelpivato avatar Jul 08 '21 13:07 rafaelpivato

One benefit of using dictionaries instead of lists/arrays though would be that user code would not rely on the order of elements, but rather get them by name, making the code more robust to potential changes in safety.

My code recently broke indeed, because of added fields.

pawamoy avatar Jul 08 '21 14:07 pawamoy

I'm closing this because Safety 2.0 improved the JSON structure with more information and added remediations for API Keys users.

Thanks to you all for the ideas to improve our JSON report.

yeisonvargasf avatar Nov 14 '22 17:11 yeisonvargasf