Mono-Repo with different types not scanned correctly
I have tested two projects layouts:
-
projects in separate folders like this:
-
backend/build.gradle.kts -
frontend/package.json
-
-
projects nested within eachother like this:
-
build.gradle.kts -
frontend/package.json
-
Both the documentation as well as the ChatGPT assistant suggest, that a command as simple as cdxgen . should automatically find all projects and generate a combined SBOM for them.
The command does seem to find the projects in both cases, at least the log contains things related to npm and gradle, however:
In case (1) the resulting bom.json contains components for all dependencies of both projects, however when importing the bom into a dependency-track project only one of the projects is part of the dependency tree. In case (2) the resulting bom.json only contained components for one of the projects, the root project.
Can you share a sample repo to reproduce the issue? You must be facing two limitations:
- There must be a build.gradle in the root, since there is some hardcoded path in a few places.
- Automatic installation for npm (when there are no lock files) is limited to just one I think.
A range of samples will help improve this feature significantly.
I can provide example projects next week
https://github.com/pschichtel/cdxgen-reproducer
the repo contains 2 projects, each once in the nested structure and in the side-by-side structure as described above.
when importing the either bom into dependency-track it shows the dependency tree only with the npm project, but the components still include e.g. ktor from the gradle project.
For reference screenshots of the side-by-side project version in our dependency track installation (the nested version is identical):
Thank you for the samples. This exactly hits two different limitations in gradle and npm. Fixing this is a non-trivial task, especially testing since every single line of change could break something somewhere for someone. Will you be interested in contributing a PR working with us? Or we can keep this open and see if anyone is willing to sponsor.
I personally would be willing to give it a shot, however this would definitely need some initial pointers since I have absolutely no clue on where exactly these limits are and where to start. I'd also have to check with my company if time can be allocated to this, especially since workarounds exists (e.g. scanning each project individually and merging the boms)
I've had a look at the codebase with a colleague. 11.1.2 btw fixes part of the problem: The dependency tree is now complete, just incorrectly structured (npm project always becomes the root project, no matter where it lives in the folder tree).
I think we know where this would need changes to work differently, but together with all the other issues we faced with the tool today we are unlikely to continue with it. We will be focusing our effort on using more specialized sbom generators for the various different types of projects and merging the individual sboms using cyclonedx-cli. This seems to be the conclusion in other departments as well.
Have you tried running cdxgen with ordered types. Example: -t gradle -t npm? cdxgen does outperform most specialized generators btw. You can feel free to generate various sboms and use this tool to compare them.
https://github.com/AppThreat/custom-json-diff
When restricting it to just these two types the resulting sbom cannot be imported into dependency track (haven't checked why specifically).
If there are no validation errors on the cdxgen side, possible there are some uncaught validation errors on the DT side. Is that reproducible using your sample repo?
Is Dependency Track the only platform not supporting dangling trees? I think we can remove a whole category of bugs by simply not having a root node for dependency trees representing monorepos.
@malice00 any ideas how we can approach the parent component problem in monorepos?
Is Dependency Track the only platform not supporting dangling trees? I think we can remove a whole category of bugs by simply not having a root node for dependency trees representing monorepos.
I'm not familiar with anything other than DT, but yes, DT only shows a tree starting at the root-component. We could leave it out for monorepos, but in DT that would mean no tree at all... Also, isn't a root-component mandatory? Haven't checked that in the specs yet...
@malice00 any ideas how we can approach the parent component problem in monorepos?
Maybe we could use the parameters --project-XX to create a root-project in case there are multiple projects found in the working tree? If you agree, I could give that a try...
For the nested case you could rely on the directory structure to infer the relation between projects, for the side-by-side you'd definitely need a synthetic root.
What about introducing a "synthetic" project type, that would look or a special file (e.g. cdxgen.yaml or so), and simply produce a component without any dependencies based on a description inside the file. then, together with the folder structure, dependencies between those could be inferred.
That was another idea I had, but might take a little longer to implement. We could then have the layout dictated by the user:
group: xx
name: xx
type: synthetic
subs:
- path: component1
group: xx_1 # in case overriding is wanted
name: xx_1 # in case overriding is wanted
type: gradle
- path: component2
type: npm
- path: dir3/another_component
type: npm
Maybe something like this?
Question is though, if this could be added to the current configuration format, of if maybe a v2 is necessary for this...
How about we add a split mode to cdxgen to generate separate SBOM files per type and optionally create an index file to link them together. Such a feature is also useful for ML users since each individual BOM would be smaller.
Aggregate could then become a separate command or the users could feel free to use other cli tools?
Or, we get rid of logic like below that attempts to create a single parent.
https://github.com/CycloneDX/cdxgen/blob/ffc6796ee8db1fbbeda29b3fc3fb11ddc562cfce/lib/cli/index.js#L6704
In postgen, we create the parent by using parent-ref from the cli arguments or have some kind of project type hierarchy to decide which type becomes the parent. Example, if there are java and npm packages, we make the java (maven) the parent?
Another way to go is check the paths of the projects and if they are not nested, add a synthetic component (eg using the dir-name for the component) and plug the projects into that.
It does bring the question on what to do with the --project-XX-parameters, in case they are set... Where should these be set? On the synthetic component? And what if we would like to override the sub-projects as well?
I do really like the idea of the config-file (had been playing with something like that in my head even before this issue came up), but I think that needs to be thought out some more and have some good default if the configuration is 'incomplete'... But I'm open to give something a go -- am stuck a bit on (repo-)testing my implementation of cocoapods, so a change of subject would be welcome... 😉
Shall we meet online to discuss the ideas? Would appreciate it if you could suggest a convenient time using the calendly link below
http://calendly.com/prabhuat