cdxgen icon indicating copy to clipboard operation
cdxgen copied to clipboard

Support for regex in filter object while running /sbom

Open arkajnag23 opened this issue 1 year ago • 9 comments

Discussed in https://github.com/CycloneDX/cdxgen/discussions/1261

Originally posted by arkajnag23 July 23, 2024 I am using CDXGEN server mode and using POST method (/sbom) to generate the SBOM. I have multi module Maven projects which includes Angular JS + Maven; CDXGEN seems to generate the sbom with required components. But now, I want to exclude certain groups/artifacts to generate the filtered SBOM. As I want to exclude hence I tried using negative lookahead with regular expression, like below: curl -X POST http://localhost:9090/sbom \ -H "Content-Type: application/json" \ -d '{ "path": "/var/<workspace path>", "type": "maven,js", "multiProject": true, "resolveTransitive": true, "recurse": true, "installDeps": true, "filter": "^(?!.*(abc\\|test|)).*$" }'

While going through the source code, it seems the filterBom method, doesn't support Regular Expression.

Can someone provide some support on the same?

arkajnag23 avatar Jul 23 '24 08:07 arkajnag23

This is correct. exclude is not supported for server mode. Best way to move this forward is to find a contributor, since this is a non-trivial effort.

prabhu avatar Jul 23 '24 09:07 prabhu

@arkajnag23 could you try using the exclude attribute with the latest version?

prabhu avatar Jul 24 '24 14:07 prabhu

@prabhu Are you referring to 10.8.9?

arkajnag23 avatar Jul 24 '24 14:07 arkajnag23

Thanks @prabhu for adding exclude in server mode; but this actually not resolving my issue. As shared in the documentation, exclude is mainly to remove files and directories. So I was trying to find a option via filter where I can try to exclude packages or like something to exclude group-id or artifact id. Like : excludeGroups or excludeArtifacts type. Reason: When we are scanning and analyzing and generating the SBOM, it generates SBOM for our internal libraries/dependencies as well, which shouldn't be a part of final report.

As filter supports to provide package details, hence my initial attempt was to use regex and negative lookahead to remove what I don't need to be in filtered SBOM. I understand as well, we have support of export MVN_ARGS, but that can bring complexities of its own.

arkajnag23 avatar Jul 24 '24 15:07 arkajnag23

Filter is an array of strings where you can pass any part of a purl like group or package name; even maven and gradle profile names.

https://github.com/CycloneDX/cdxgen/blob/be4e4f424a984fc96cd63173d14cd13098dd2865/lib/server/openapi.yaml#L263

prabhu avatar Jul 24 '24 15:07 prabhu

@prabhu If my understanding is correct , filter accepts array of String of packages what we want to include/extract and not what we want to exclude. The number of packages what we want to exclude is limited, whereas what we want to include can be unlimited.

If filterBom method can support REGEX then it would be really useful to define something like this: "^(?!.(abc\|test|)).$" So, that filter will know, what not to include.

arkajnag23 avatar Jul 24 '24 15:07 arkajnag23

Filter is to exclude. Only is to include. Can you give it a try please?

prabhu avatar Jul 24 '24 18:07 prabhu

Thanks @prabhu for clarifying. But purl contains check seems to be happening on other field rather than purl object. Correct me if am wrong here: image Rather than verifying on : image

When I used this CURL request, then I am getting many dependencies being analyzed.

curl -X POST http://localhost:9090/sbom -H "Content-Type: application/json" -d '{
  "path": "/var/EventHub/event-hub-core/event-hub-core/",
  "type": "maven,js",
  "multiProject": true,
  "resolveTransitive": true,
  "recurse": true,
  "installDeps": true,
  "filter" : ["grid.runtime","event-hub-tests"]
}' > event-hub-sbom.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1473k    0 1472k  100   274   8070      1  0:04:34  0:03:06  0:01:28  386k

whereas when using , the result is very different

curl -X POST http://localhost:9090/sbom -H "Content-Type: application/json" -d '{
  "path": "/var/EventHub/event-hub-core/event-hub-core/",
  "type": "maven,js",
  "multiProject": true,
  "resolveTransitive": true,
  "recurse": true,
  "installDeps": true,
  "filter" : ["grid.runtime","event-hub","event-hub-tests"]
}' > event-hub-sbom-1.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1578    0  1292  100   286      6      1  0:04:46  0:03:11  0:01:35   335

As the verification is happening on different purl -> value.

arkajnag23 avatar Jul 24 '24 19:07 arkajnag23

@prabhu Together with my above comment, want to clarify why filter was designed like: Use --filter to filter components containing the string in the purl or components.properties.value. Because in such cases, where components.properties.value are matched then most of the documents will get excluded/filtered and won't even allowed to focus on actual dependencies. Isn't components.properties.value can be supported by exclude?

Was there any specific business requirement to have the filter check on components.properties.value

arkajnag23 avatar Jul 25 '24 11:07 arkajnag23