credo icon indicating copy to clipboard operation
credo copied to clipboard

`--format=sarif` still outputs summary with using `mix credo diff`

Open mashton opened this issue 1 year ago • 2 comments

Environment

  • Credo version (mix credo -v): 1.7.7
  • Erlang/Elixir version (elixir -v): Erlang 24/Elixir 1.16.3
  • Operating system: macos 14.6.1

What were you trying to do?

use SARIF format as output, write to a file, and upload to GHAS

Expected outcome

> mix credo diff --from-git-merge-base origin/main  --format=sarif
{
    "just": "sarif formatted JSON"
}

Actual outcome

> mix credo diff --from-git-merge-base origin/main  --format=sarif
{
   "not_just": "sarif formatted JSON"
}
Diffing 3 source files in working dir with main ...
Please report incorrect results: https://github.com/rrrene/credo/issues
Analysis took 0.03 seconds (0.03s to load, 0.00s running 55 checks on 3 files)
Changes between <hash> and working dir:

+  added no issues,
✔  fixed no issues, and
~  kept no issues.

I'm totally willing to try my hand at a PR. Let me know if this seems to be a bug.

mashton avatar Sep 10 '24 15:09 mashton

@mashton You are right.

The big question is, what should that output look like? Only list the new issues?

rrrene avatar Nov 04 '24 20:11 rrrene

The big question is, what should that output look like? Only list the new issues?

@rrrene, good point. Tricky question.

I think the SARIF way to do this is to return all the results and indicate in each result object whether it's a new issue or an existing one.

Here's what I see in the SARIF spec along these lines: The run object may have a baselineId property which points to the id of the run object off which the current run is based. If a run object has a baselineId, then its results may each have a baselineState indicating whether that change is "new", "existing" or "absent" when compared to the base.

So to be concrete, we can imagine this scenario: We have a github action that runs mix credo diff --from-git-merge-base origin/main --format=sarif on each PR The resulting file might look like:

{
   "$schema": "https:/1/schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.5.json",
   "version": "2.1.0",
   "runs": [
      {
         "id": "<commit hash for target of analysis>",
         "baselineId": "<commit hash of origin/main>",
         "results": [
            {
               "ruleId": "EX3009",
               "baselineState": "new",
               "otherStuff": "..."
            },
            {
               "ruleId": "EX3007",
               "baselineState": "existing",
               "otherStuff": "..."
            },
            {
               "ruleId": "EX3023",
               "baselineState": "absent",
               "otherStuff": "..."
            }
         ]
      }
   ]
}

What's not immediately clear to me is whether it's meaningful for a baselineId to point to the id of a run from a different artifact. My intuition is that the "result management system" (SARIF's name for the system interpreting the SARIF artifact) is responsible for making inferences about run ids and baselineIds.

Thoughts?

mashton avatar Nov 07 '24 20:11 mashton