feat: add "vertical" output format
This adds a new "vertical" output format that is designed for humans and based on the output of osv-detector, which effectively aims to group the output relating to each entity being scanned in vertical slices:
Unfortunately I think it suffers significantly due to the assumptions made by the rest of the codebase for outputting that made sense when the final output was a table i.e. we dump a lot of information as we go about scanning, config files, vulnerability filtering, and so on that really should be grouped but currently cannot because they're all outputted at different stages - I think a way to address that could be using some sort of event-emitter type pattern so that the reporters could be responsible for deciding what they actually do (e.g. r.Emit("filtered-vulnerability", ...) and then most reporters could choose to just print immediately, and ones like "vertical" could choose to add it to an internal struct), but I think that'll involve a lot more work; for now I'm just going to ignore the pre-results output.
Resolves #85
Codecov Report
Attention: Patch coverage is 93.33333% with 14 lines in your changes missing coverage. Please review.
Project coverage is 66.55%. Comparing base (
1ea785e) to head (347ca61).
| Files | Patch % | Lines |
|---|---|---|
| pkg/reporter/vertical_reporter.go | 70.27% | 11 Missing :warning: |
| internal/output/vertical.go | 98.24% | 2 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #889 +/- ##
==========================================
+ Coverage 66.11% 66.55% +0.44%
==========================================
Files 160 162 +2
Lines 12792 13002 +210
==========================================
+ Hits 8457 8654 +197
- Misses 3877 3889 +12
- Partials 458 459 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I realised that the tests I've added here actually serve for all the output formatters, so I'm going to split this into two PRs - one adding a general set of tests that get used across all outputters to showcase their output, and a second adding the new "vertical" output format
I need to refactor this to not use fatih/color
LGTM - however I am not sure if we want the output in this color schema. @another-rex is there any discussion/agreement on this?
Thanks @G-Rath! It looks really nice! I tested it on container scanning, and the vertical results are much clearer than table ones. Adding some suggestions to further improve the display of container scanning results:
- The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)
- Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).
- We should probably remove Docker names from each source and add ecosystem names (e.g., change "go-tester.tar:/var/lib/dpkg/status" to "/var/lib/dpkg/status (Debian:12)").
- To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").
- We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)
- It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.
- Add uncalled vulnerabilities. (Maybe we can consider excluding uncalled vulnerabilities from the total count, or hide them by default with an option to show them)
Example (Feel free to modify):
Scanning image go-tester.tar
170 unimportant vulnerabilities have been filtered out.
Filtered 170 vulnerabilities from output
Image go-tester.tar (Debian 12.5) result
Total 7 (4 High, ...)
/artifact/main-built-with-1-21-0 (Go): found 2 package with issues
2 known vulnerabilities found
x license issue found
[email protected] is affected by the following vulnerabilities:
CVE-0000-0001: 8.0 (HIGH)
GO-0000-0001: description here
Fixed version: 1.22.0
GHSA-0000-0001: description here
[email protected] is affected by the following vulnerabilities:
CVE-0000-0002: 8.1 (HIGH)
GO-0000-0002: description here
Fixed version: 1.1
GHSA-0000-0002: description here
Fixed version: 1.1
Uncalled vulnerabilities
We found 1 potential vulnerabilities, but it is currently not being called or executed in the code
[email protected] may be affected by the following vulnerabilities:
GO-0000-0003: (when no CVE ID)
GO-0000-0003: description here (when no fixed version available)
GHSA-0000-0003: description here
/src/main (Go): found 3 packages with issues (Place other sources from the same ecosystem here)
3 known vulnerabilities found
.....
/var/lib/dpkg/status (Debian:12): found 8 packages with issues
2 known vulnerabilities found (Even though there is only one unique vulnerability, it is detected in multiple packages, so we count it multiple times in the report.)
[email protected] is affected by the following vulnerabilities:
CVE-0000-2781: 5.0 (Medium)
CVE-0000-2781:
description: description here
[email protected] is affected by the following vulnerabilities:
CVE-0000-2781: 5.0 (Medium)
CVE-0000-2781:
description: description here
@hogo6002 thanks for the review and detailed writeup! It would be really useful if you could provide some reproducations for each of your points as I'm not across all of the different aspects of the scanner like I think you are, so I'm less setup to e.g. produce a result that has a uncalled vulnerability, multiple ecosystems, image scanner results, etc.
Alternatively feel free to provide test cases since then you can just craft the situation you're thinking of rather than having to track down a real-world image/lockfile, and since we'll need those anyway to reflect the changes in the snapshots.
I think the main concern I have (if I'm understanding things right) is that your proposals would likely greatly increase the output in both directions, which is potentially a bad thing
e.g.
We could use CVE IDs as the default identifiers, along with their severity scores. For each vulnerability, we can include all associated group aliases, along with their descriptions and fixed versions. (Group aliases should not be counted as separate vulnerabilities in the total count.)
In my experience most advisories have at least three aliases: a CVE, a GHSA, and an ID from the original source (Go DB, Rust sec DB, Ruby sec DB, Python sec DB, etc) - so this would mean for every advisory you'd be looking at four lines min + each of those will have a summary that is probably very similar but not always the same that the user has to look through.
We'll also want to include a link to make it easy to access, so that's a link per advisory; and the representation of a "fix version" is typically different and multiple per advisory - some have commits, some have version ranges, and some have both. Plus there's determining which fixed version to show (either we show just the latest fix version which might not be in the same major version, or we try to determine the best fit in which case this outputter is suddenly doing a lot more complex work...).
In doing this, we'd have also added at least 2-3 levels of indenting, pushing the content further right, and this gets even worse in ecosystems like NPM that support multiple versions of the same package 😅
I definitely think there's more exploring to do with the outputting, but it feels like it could be too much to be try and tackle in this initial PR - maybe also because currently the outputters don't have any real concept of context, and that that might play a role? (i.e. they don't know when you're scanning a container vs a host vs an sbom, and I think things like summaries and os versions have different usages depending on what scanning mode you're actually in)
re your other comments:
The vertical output doesn't work when pipe results to a text file. (osv-scanner scan ... > result.txt)
I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?
Shall we add a summary at the top to Include the image name, OS version, and total vulnerability count. (OS version and total vulnerability count may need to wait for v2).
To maintain consistency, should we also display the vulnerability and license issue count at the top of each source section ("2 known vulnerabilities found").
I'll skip this until we've discussed my opener, but just fyi in case you missed it, we do already put the summary count at the end of each section which I had there are a form of block-end-marker.
It would be better to place results from the same ecosystem together. For example, all language packages could be listed first, followed by OS related packages.
I don't think we should do this at least for now because it makes an assumption about the classification and priority of ecosystems that afaik isn't captured in the osv spec
@G-Rath you are right, this is just an initial PR, we should probably not add too many new features to this. I will address the container scanning output format in a separate PR with some screenshots for better understanding.
Adding some clarification for other comments:
I can't reproduce this and is very unexpected as we're explicitly printing to stdout + matching the table reporter - can you confirm the exact command you're running and in what terminal?
I am using the command go run ./cmd/osv-scanner/ scan -r --format vertical ./project > result.txt The output file seems to have some bugs:
Another comment about uncalled vulnerabilities: when I use the command to scan a Go project, the vertical output doesn't show which vulnerabilities are uncalled.
@hogo6002 that output is correct - the weirdness you're seeing is the color characters.
It seems that the color library we're using doesn't do any special checks on when to include colors - I don't know the full details, but my understanding is that there are ways libraries can detect that they're being piped and so often these libraries have that built in.
I'll dig further into that and see what we can do
FYI we do a similar check with the table reporter to avoid adding styling when stdout isn't the terminal by passing in a terminalWidth of 0. (cmd/osv-scanner/main.go:155)
@hogo6002 color outputting should now be fixed - while looking into that, I discovered that we could simplify the other outputters a bit too so have opened #1087 doing just that; weirdly, calling DisableColor in reporter seems to only work for the vertical output so it's a little inconsistent but I'll revisit that once we've landed this since we've talked about making changes to the table output anyway (@another-rex I can't remember if we were going to completely remove the table output or just make vertical the default?)
Per our catch-up today, that just leaves call analysis to implement as the rest we'll tackle as followups - I should have mentioned it at the time, but fact that call analysis isn't in any of our output snapshots means we're missing them entirely from the output fixtures; it'd be great if someone could do a PR adding even just a basic sample of what that looks like in the scanner results struct, to save me some time - but if everyone's busy I can try to figure it out from the existing fixtures
@G-Rath Is this the call analysis output that you are looking for: https://github.com/google/osv-scanner/blob/4a3375f3b7046a791d0d1d370d559caa3c02c132/cmd/osv-scanner/snapshots/main_test.snap#L479
The first step will be making the vertical output default. I'm still not sure about whether we want to remove table output or not yet.
@another-rex @hogo6002 called vs uncalled is now being separated in the vertical output so this should be good for re-review - I've intentionally stuck with a minimal presentation for now rather than introduce a new layer of intending and headers that more explicitly call out "there are uncalled" like in @hogo6002's original sample, as I think it's serviceable and figure that'd be better to start with then over-implementing something very wordy that we might not like anyway