osv-scanner feat(output): use new internal data structure

This is an initial attempt to convert the table output to use the new Output.Result and co that is being used for container scanning.

Overall, I think I'm actually pretty close, but I'm going to put it down for now to focus on slog integration

Feb 12 '25 00:02 G-Rath

(I think we might also have a sort bug or two, but first I want to confirm if the test cases are actually valid since that'll be a huge source of difference either way)

Feb 12 '25 00:02 G-Rath

@hogo6002 I'm looking at picking this back up since it sounds like something we want overall?

I think there's two main points:

I've switched to using source.Name, only caveat is that has the "type" included with a colon which'll be present even if there is no type; for now to maintain the existing output I'm trimming that colon prefix, but that might be something we want to change in future
it looks like the old logic was sorting first by package source/path, whereas now we're sorting by ecosystem; personally I think it's better to be sorted by source first, but wanted to make sure we're in agreement on that before I look into changing it

(I'm not sure about the other sorting changes, but they look at least harmless enough to ignore them, though I imagine I'll probably find out the reason for them too if I dig into 2.)

Mar 13 '25 19:03 G-Rath

I've switched to using source.Name, only caveat is that has the "type" included with a colon which'll be present even if there is no type; for now to maintain the existing output I'm trimming that colon prefix, but that might be something we want to change in future

sounds good to me. We can just trim the colon prefix first.

it looks like the old logic was sorting first by package source/path, whereas now we're sorting by ecosystem; personally I think it's better to be sorted by source first, but wanted to make sure we're in agreement on that before I look into changing it

Within each ecosystem, it also sorts by source name. It groups all sources from one ecosystem together, which benefits the display of container scanning results. For project scanning, I think we could also separate the table into multiple tables by ecosystem, similar to the package table view. Then, we can just remove the ecosystem column from the result table. If we don't want to make a big change, I think it's fine to show vulnerabilities from the same ecosystem together.

Mar 13 '25 23:03 hogo6002

For project scanning, I think we could also separate the table into multiple tables by ecosystem, similar to the package table view ...

I think you might have misunderstood - I'm asking if we want to be sorting first by source, or by ecosystem, as previously we were doing it by source but now we're doing it by ecosystem, and I personally think it would be better to sort by source first; if it's already been decided to not sort by source first intentionally, then so be it - but I'm not sure if that is the case, or if we've just defaulted to sorting by ecosystem (e.g. as a result of this new structure having been built for container scanning first, where the preference is by ecosystem).

The reason why I think "source first" would be better is that's one of the first things people think about as part of triaging and actioning - consider, you're scanning multiple codebases each with their own package-lock.json; you'd see say there's two entries for one of the lockfiles, and so you know that one has a low number of vulnerabilities, and for another lockfile there's a much larger list. Also, when starting to action those, you need to go to each package-lock.json file in their respective locations to run commands for actioning, and only then do I care about what those commands are (which are provided by the "ecosystem", in part).

For project scanning, I think we could also separate the table into multiple tables by ecosystem, similar to the package table view

I think this is a good idea in either case, but it doesn't answer if we do the output by ecosystem or by source.

This could be something that's actually worth a flag e.g. --group-by=<source|ecosystem>

Mar 13 '25 23:03 G-Rath

After writing the above out, I dug back into the codebase and I think the answer is it's already been decided to sort by ecosystem first so feel free not to spend time rebutting if that's the case 🙂

I personally do still feel I'd prefer the order by source than ecosystem because it feels less surprising to me, and that a flag for that could be useful, but among other things I think I'm a little biased by our test suite using "one, two, three" type file names which make it easy to mentally sort so in the wild it might be less surprising 🤷

Mar 13 '25 23:03 G-Rath

I personally do still feel I'd prefer the order by source than ecosystem because it feels less surprising to me, and that a flag for that could be useful, but among other things I think I'm a little biased by our test suite using "one, two, three" type file names which make it easy to mentally sort so in the wild it might be less surprising 🤷

Yes, for cases where users scan multiple projects, sorting by source only might make it easier to view the source with the highest number of vulnerabilities. (But also, in real-life scans, a source usually only has vulns from one ecosystem, so all vulns from a source should already be together.) The best approach, I think, is to follow the HTML output method: group all sources from one ecosystem together but separate each source into individual tables. That way, it's quite straightforward to review the number of vulnerabilities for each source. We've implemented this in both the vertical and HTML outputs. For this PR, I feel we can try to make less changes, which means keeping everything in one large table, as this is less surprising to users. If we receive complaints from users about this sorting, we can then separate them into multiple tables or add extra flags. For now, I think it's fine. (Personally, I wish users would use the vertical and HTML outputs more than the table output.)

Mar 14 '25 00:03 hogo6002

The best approach, I think, is to follow the HTML output method: group all sources from one ecosystem together but separate each source into individual tables.

Personally I still think the other way around, especially for the HTML output - I tend to think about codebases/projects first, then ecosystems, but I think we can put a pin in this for now 🙃

I might see if I can get the old implementation sorting by ecosystem first to reduce the diff here, but otherwise I think this is resolved for now.

(Personally, I wish users would use the vertical and HTML outputs more than the table output.)

That's actually a really good reminder - weren't we going to make the vertical output the default?

Mar 14 '25 01:03 G-Rath

Yeah we were - is that something we still want to do? cc @another-rex

fwiw, I don't think that's actually technically a breaking change but we might as well do it now if we want before v2 goes out

Mar 14 '25 01:03 G-Rath

This pull request has not had any activity for 60 days and will be automatically closed in two weeks

May 13 '25 01:05 github-actions[bot]

hey, is the PR ready for review?

Jun 25 '25 00:06 hogo6002

@hogo6002 yeah it should be - I think this ended up falling off my radar between other work and then you were off for a week or so and then I forgot about it again 😅

I think it should be pretty much good to go, but ultimately I think it would be best if you just do a fresh review rather than us trying to remember what we were discussing 2+ months ago.

(I'm happy to do a new PR too if that'd be cleaner)

Jun 25 '25 01:06 G-Rath

yeah oh that's right, for some reason now the table output is using wrong widths for columns so I need to figure out what's happening there - still @hogo6002 feel free to look over the changes and let me know if there's anything else that you think should be actioned

Jun 25 '25 01:06 G-Rath

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 66.06%. Comparing base (921d074) to head (938ddcb). Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1609      +/-   ##
==========================================
+ Coverage   66.02%   66.06%   +0.04%     
==========================================
  Files         170      170              
  Lines       16222    16243      +21     
==========================================
+ Hits        10710    10731      +21     
  Misses       4857     4857              
  Partials      655      655

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Jul 06 '25 22:07 codecov-commenter

I will merge it now. I think we need a followup PR to hide unimportant vulns when no --all-vulns flag. We can also add fixed version info into table output

Jul 07 '25 04:07 hogo6002

Yeah that is probably the way to go given how much rebasing this has needed anytime something else is merged in, and then we can work on building on this.

fwiw I'm starting to suspect our intermediate struct(s) is handling far too much stuff when instead it should be just holding structured data and instead the output package should be putting it together for output e.g. source.Name is being set to source.Type + source.Path only for us to later strip that out in output since we just want source.Path (for now) - I've not dug into this too deeply and wouldn't be surprised if at least some of this is the result of stuff out of our control, but worth keeping in mind for future work in this area I think

Jul 07 '25 04:07 G-Rath