cyclonedx-python-lib
cyclonedx-python-lib copied to clipboard
Components not properly in dep tree nor BOM
This is version 6.4.0
Components with unique bom_refs, but the same name, will generate an error when trying to render a dependency tree. Given this script:
#!/usr/bin/env python3
from cyclonedx.model.bom import Bom
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.output.json import JsonV1Dot5
bom = Bom()
bom.metadata.component = root_component = Component(
name='myApp',
type=ComponentType.APPLICATION,
bom_ref="myApp"
)
component1 = Component(
type=ComponentType.LIBRARY,
name='some-component',
bom_ref="some-component"
)
bom.components.add(component1)
bom.register_dependency(root_component, [component1])
component2 = Component(
type=ComponentType.LIBRARY,
name='some-library',
bom_ref="some-library1"
)
bom.components.add(component2)
bom.register_dependency(component1, [component2])
component3 = Component(
type=ComponentType.LIBRARY,
name='some-library',
bom_ref="some-library2"
)
bom.components.add(component3)
bom.register_dependency(component1, [component3])
print(JsonV1Dot5(bom).output_as_string(indent=2))
I get this error when I run it:
Traceback (most recent call last):
File "/Users/tek30584/programming/cdx_lib_bugs/./duplicate_name_bug.py", line 38, in <module>
print(JsonV1Dot5(bom).output_as_string(indent=2))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/output/json.py", line 82, in output_as_string
self.generate()
File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/output/json.py", line 70, in generate
bom.validate()
File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/model/bom.py", line 600, in validate
raise UnknownComponentDependencyException(
cyclonedx.exception.model.UnknownComponentDependencyException: One or more Components have Dependency references to Components/Services that are not known in this BOM. They are: {<BomRef 'some-library2'>}
will add test cases to older versions, for regression and showcasing-purposes, and then forward-port those and add fixes.
I am planning to have all affected major versions fixed.
++++
the issue does not rely on same name of components, but the fact that both component have an equal (not identical) bom_ref - with a None
value.
Need to showcase this in tests.
the issue does not rely on same name of components, but the fact that both component have an equal (not identical) bom_ref - with a None value.
I'm not understanding this. All the components in the example script have unique bom_ref
values.
did some research and found, that component3
was not added to the BOM.
This is an expected behavior, as the properties relevant for equality of component2
are the same as of component3
.
Per CycloneDX Specification, the minimal set of equality-properties of Components are: type, name` Bom-ref is not an identity-property nor an equality property. Bom-ref's purpose is linking elements, it does not have any meaning besides this purpose.
As all is ass expected, I'll close this issue. This Issue is not locked, so further discussion may take place here, instead of opening another issue.
Ah, that is interesting. I would expect a "guaranteed unique" property would be used for testing "unique-ness." But I understand. I guess I'm drawing too much on my database experience. :) Thanks for digging in to this and doing the research!
I encountered a similar issue with a different component tree. One of the features of CycloneDX is the ability to declare nested components, which is helpful when merging multiple BOM files into a single one. In some cases, these BOM files may contain identical components. However, depending on the resulting component tree, I observed inconsistent behavior - either receiving an exception or no exception.
-
Case 1: The BOM file contains two identical components (x, y). The only difference is the 'bom-ref'. The validation works fine, because the components are nested below different root components (A, B) --> No Exception
-
Case 2: The BOM file contains two identical components (x, y). The only difference is the 'bom-ref'. The validation raises an exception, because the components are both nested below the same root component (A) --> 'UnknownComponentDependencyException'
In my opinion, the component equality validation should be performed within the context of a parent component, rather than within the context of a root component. Am I misunderstanding something?
@madpah FYI
this issue should have been closed via https://github.com/CycloneDX/cyclonedx-python-lib/pull/587 or so
From my perspective, it seems that this issue does not specifically address the same problem.
To reproduce the second case mentioned here, you can use the provided example BOM file UnkCompDepEx.json.
Here is a code example that reads the BOM file and writes it to a string:
import json
from cyclonedx.model.bom import Bom
from cyclonedx.output.json import JsonV1Dot5
with open("UnkCompDepEx.json") as input_json:
deserialized_bom = Bom.from_json(data=json.loads(input_json.read()))
print(JsonV1Dot5(deserialized_bom).output_as_string(indent=2))
When running this code, I get the 'UnknownComponentDependencyException' error message:
File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\output\json.py", line 83, in output_as_string
self.generate()
File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\output\json.py", line 71, in generate
bom.validate()
File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\model\bom.py", line 658, in validate
raise UnknownComponentDependencyException(
cyclonedx.exception.model.UnknownComponentDependencyException: One or more Components have Dependency references to Components/Services that are not known in this BOM. They are: {<BomRef 'y2' id=2229836430400>, <BomRef 'x2' id=2229836429824>}
Feel free to reach out if you need further assistance
I think the problem lies in the deduplication of SortedSet; elements are not added to the set when their value matches.
The equality operator in the Component class compares the objects' hash values, which ignores the nested BomRef
object.
Example:
>>> from cyclonedx.model.component import Component, BomRef
>>> c1 = Component(bom_ref=BomRef(value='71dd3e04-ac87-4f23-818f-3cd9a17bc785'), name='pip')
>>> c2 = Component(bom_ref=BomRef(value='3d383d8b-af76-462f-9701-503cb6c5eb00'), name='pip')
>>> c1 == c2
True
If object equality is not defined by their BomRef
, the assumption that BomRef
is a foreign key does not hold.
The related PR https://github.com/CycloneDX/cyclonedx-python-lib/pull/587 seems to deal with a similar issue, where certain fields were omitted from the lt operator.
I think the problem lies in the deduplication of SortedSet; elements are not added to the set when their value matches. The equality operator in the Component class compares the objects' hash values, which ignores the nested
BomRef
object.Example:
>>> from cyclonedx.model.component import Component, BomRef >>> c1 = Component(bom_ref=BomRef(value='71dd3e04-ac87-4f23-818f-3cd9a17bc785'), name='pip') >>> c2 = Component(bom_ref=BomRef(value='3d383d8b-af76-462f-9701-503cb6c5eb00'), name='pip') >>> c1 == c2 True
If object equality is not defined by their
BomRef
, the assumption thatBomRef
is a foreign key does not hold. The related PR #587 seems to deal with a similar issue, where certain fields were omitted from the lt operator.
bom-ref
is intentional not part of the __eq__
or __hash__
method - it has no meaning, it is a mere reference-indicator.
c1
and c2
are the same component -> both are pip
Fair point but when there are multiple BOM files sourced in the python script, this is no longer applicable.
When merging sets of Vulnerability
objects from separate BOM files, they refer to their own Component
.
If only one of these is added to the global list of Components
, all references will fail since they assume they can use BomRef
.
For example, in this case for already loaded bom_files 1 and 2:
for loaded_bom in [bom_file1, bom_file2]:
root_bom.components |= loaded_bom.components
root_bom.dependencies |= loaded_bom.dependencies
The set difference loaded_bom.dependencies - loaded_bom.components
is empty, while root_bom.dependencies - root_bom.components
may not be (due to deduplication based on values without BomRef
).
The validate
call on the root_bom
will then throw a UnknownComponentDependencyException
as described in the issue.
Is the BomRef
excluded from the hash for optimization? It seems to work different in cyclonedx-dotnet-library.
The merge
option in cyclonedx-cli does not seem to do this, allowing for duplicate components instead.
Fair point but when there are multiple BOM files sourced in the python script, this is no longer applicable. When merging sets of
Vulnerability
objects from separate BOM files, they refer to their ownComponent
. If only one of these is added to the global list ofComponents
, all references will fail since they assume they can useBomRef
.For example, in this case for already loaded bom_files 1 and 2:
for loaded_bom in [bom_file1, bom_file2]: root_bom.components |= loaded_bom.components root_bom.dependencies |= loaded_bom.dependencies
The set difference
loaded_bom.dependencies - loaded_bom.components
is empty, whileroot_bom.dependencies - root_bom.components
may not be (due to deduplication based on values withoutBomRef
). Thevalidate
call on theroot_bom
will then throw aUnknownComponentDependencyException
as described in the issue.
then the python script you wrote needs top take care of this.
bom-ref
is intentional not part of the__eq__
or__hash__
method - it has no meaning, it is a mere reference-indicator.c1
andc2
are the same component -> both arepip
I have submitted https://github.com/CycloneDX/cyclonedx-cli/issues/399 to address the discrepancy between the implementation in python and c# libraries. It is however unclear to me as to where the intended behaviour is specified. Do you know if it is somewhere in the spec? There's nothing mentioned in the CycloneDX JSON Reference, although that's arguably the wrong location to explain deduplication.
Note also that the MergeTests for the C# library explicitly do not take bom-ref
of components into account.
For that matter, this also isn't tested in neither test_component.py nor test_model_component.py.
NB - I'm using v1.5 of the spec because the C# library (and therefore cyclonedx-cli
) does not yet support v1.6.
bom-ref
is intentional not part of the__eq__
or__hash__
method - it has no meaning, it is a mere reference-indicator.c1
andc2
are the same component -> both arepip
I was wondering if duplicate components in nested objects are supposed to be deduplicated, or if they should not be. Currently, the equality check is only called in set addition - so not in any parent component within the tree. See also https://github.com/aquasecurity/trivy/discussions/7532#discussioncomment-10677327; should nesting these components be a solution? However, if nesting would be an allowed solution - wouldn't that also mean vulnerabilities have to be duplicated?
bom-ref
is intentional not part of the__eq__
or__hash__
method - it has no meaning, it is a mere reference-indicator.c1
andc2
are the same component -> both arepip
I was wondering if duplicate components in nested objects are supposed to be deduplicated, or if they should not be. Currently, the equality check is only called in set addition - so not in any parent component within the tree. See also aquasecurity/trivy#7532 (comment); should nesting these components be a solution? However, if nesting would be an allowed solution - wouldn't that also mean vulnerabilities have to be duplicated?
not intended for nested of parallels, as they may contextualize the individual outer component.
It seems that deserialization silently loads broken bom data, in the case of duplicate components as earlier referred to.
EDIT: Created separate issue - https://github.com/CycloneDX/cyclonedx-python-lib/issues/677
I'm not sure if this still belongs in the current issue, or if I should open a separate ticket with these findings.
please open a dedicated issue for that.