cyclonedx-python-lib icon indicating copy to clipboard operation
cyclonedx-python-lib copied to clipboard

Components not properly in dep tree nor BOM

Open jkugler opened this issue 1 year ago • 17 comments

This is version 6.4.0

Components with unique bom_refs, but the same name, will generate an error when trying to render a dependency tree. Given this script:

#!/usr/bin/env python3

from cyclonedx.model.bom import Bom
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.output.json import JsonV1Dot5

bom = Bom()
bom.metadata.component = root_component = Component(
    name='myApp',
    type=ComponentType.APPLICATION,
    bom_ref="myApp"
)

component1 = Component(
    type=ComponentType.LIBRARY,
    name='some-component',
    bom_ref="some-component"
)
bom.components.add(component1)
bom.register_dependency(root_component, [component1])

component2 = Component(
    type=ComponentType.LIBRARY,
    name='some-library',
    bom_ref="some-library1"
)
bom.components.add(component2)
bom.register_dependency(component1, [component2])

component3 = Component(
    type=ComponentType.LIBRARY,
    name='some-library',
    bom_ref="some-library2"
)
bom.components.add(component3)
bom.register_dependency(component1, [component3])

print(JsonV1Dot5(bom).output_as_string(indent=2))

I get this error when I run it:

Traceback (most recent call last):
  File "/Users/tek30584/programming/cdx_lib_bugs/./duplicate_name_bug.py", line 38, in <module>
    print(JsonV1Dot5(bom).output_as_string(indent=2))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/output/json.py", line 82, in output_as_string
    self.generate()
  File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/output/json.py", line 70, in generate
    bom.validate()
  File "/Users/tek30584/programming/cdx_lib_bugs/.venv/lib/python3.11/site-packages/cyclonedx/model/bom.py", line 600, in validate
    raise UnknownComponentDependencyException(
cyclonedx.exception.model.UnknownComponentDependencyException: One or more Components have Dependency references to Components/Services that are not known in this BOM. They are: {<BomRef 'some-library2'>}

jkugler avatar Jan 27 '24 02:01 jkugler

will add test cases to older versions, for regression and showcasing-purposes, and then forward-port those and add fixes.

I am planning to have all affected major versions fixed.

++++

the issue does not rely on same name of components, but the fact that both component have an equal (not identical) bom_ref - with a None value. Need to showcase this in tests.

jkowalleck avatar Jan 27 '24 08:01 jkowalleck

the issue does not rely on same name of components, but the fact that both component have an equal (not identical) bom_ref - with a None value.

I'm not understanding this. All the components in the example script have unique bom_ref values.

jkugler avatar Jan 27 '24 22:01 jkugler

did some research and found, that component3 was not added to the BOM. This is an expected behavior, as the properties relevant for equality of component2 are the same as of component3.

Per CycloneDX Specification, the minimal set of equality-properties of Components are: type, name` Bom-ref is not an identity-property nor an equality property. Bom-ref's purpose is linking elements, it does not have any meaning besides this purpose.

As all is ass expected, I'll close this issue. This Issue is not locked, so further discussion may take place here, instead of opening another issue.

jkowalleck avatar Jan 29 '24 17:01 jkowalleck

Ah, that is interesting. I would expect a "guaranteed unique" property would be used for testing "unique-ness." But I understand. I guess I'm drawing too much on my database experience. :) Thanks for digging in to this and doing the research!

jkugler avatar Jan 29 '24 17:01 jkugler

I encountered a similar issue with a different component tree. One of the features of CycloneDX is the ability to declare nested components, which is helpful when merging multiple BOM files into a single one. In some cases, these BOM files may contain identical components. However, depending on the resulting component tree, I observed inconsistent behavior - either receiving an exception or no exception.

  • Case 1: The BOM file contains two identical components (x, y). The only difference is the 'bom-ref'. The validation works fine, because the components are nested below different root components (A, B) --> No Exception

  • Case 2: The BOM file contains two identical components (x, y). The only difference is the 'bom-ref'. The validation raises an exception, because the components are both nested below the same root component (A) --> 'UnknownComponentDependencyException'

models

In my opinion, the component equality validation should be performed within the context of a parent component, rather than within the context of a root component. Am I misunderstanding something?

eugenhoffmann avatar Apr 11 '24 12:04 eugenhoffmann

@madpah FYI

jkowalleck avatar Apr 11 '24 17:04 jkowalleck

this issue should have been closed via https://github.com/CycloneDX/cyclonedx-python-lib/pull/587 or so

jkowalleck avatar Apr 24 '24 21:04 jkowalleck

From my perspective, it seems that this issue does not specifically address the same problem.

To reproduce the second case mentioned here, you can use the provided example BOM file UnkCompDepEx.json.

Here is a code example that reads the BOM file and writes it to a string:

import json
from cyclonedx.model.bom import Bom
from cyclonedx.output.json import JsonV1Dot5

with open("UnkCompDepEx.json") as input_json:
    deserialized_bom = Bom.from_json(data=json.loads(input_json.read()))

print(JsonV1Dot5(deserialized_bom).output_as_string(indent=2))

When running this code, I get the 'UnknownComponentDependencyException' error message:

File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\output\json.py", line 83, in output_as_string
  self.generate()
File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\output\json.py", line 71, in generate
  bom.validate()
File "F:\Git\rsb-sbom\venv\lib\site-packages\cyclonedx\model\bom.py", line 658, in validate
  raise UnknownComponentDependencyException(
cyclonedx.exception.model.UnknownComponentDependencyException: One or more Components have Dependency references to Components/Services that are not known in this BOM. They are: {<BomRef 'y2' id=2229836430400>, <BomRef 'x2' id=2229836429824>}

Feel free to reach out if you need further assistance

eugenhoffmann avatar Apr 27 '24 08:04 eugenhoffmann

I think the problem lies in the deduplication of SortedSet; elements are not added to the set when their value matches. The equality operator in the Component class compares the objects' hash values, which ignores the nested BomRef object.

Example:

>>> from cyclonedx.model.component import Component, BomRef
>>> c1 = Component(bom_ref=BomRef(value='71dd3e04-ac87-4f23-818f-3cd9a17bc785'), name='pip')
>>> c2 = Component(bom_ref=BomRef(value='3d383d8b-af76-462f-9701-503cb6c5eb00'), name='pip')
>>> c1 == c2
True

If object equality is not defined by their BomRef, the assumption that BomRef is a foreign key does not hold. The related PR https://github.com/CycloneDX/cyclonedx-python-lib/pull/587 seems to deal with a similar issue, where certain fields were omitted from the lt operator.

wkoot avatar Sep 11 '24 09:09 wkoot

I think the problem lies in the deduplication of SortedSet; elements are not added to the set when their value matches. The equality operator in the Component class compares the objects' hash values, which ignores the nested BomRef object.

Example:

>>> from cyclonedx.model.component import Component, BomRef
>>> c1 = Component(bom_ref=BomRef(value='71dd3e04-ac87-4f23-818f-3cd9a17bc785'), name='pip')
>>> c2 = Component(bom_ref=BomRef(value='3d383d8b-af76-462f-9701-503cb6c5eb00'), name='pip')
>>> c1 == c2
True

If object equality is not defined by their BomRef, the assumption that BomRef is a foreign key does not hold. The related PR #587 seems to deal with a similar issue, where certain fields were omitted from the lt operator.

bom-ref is intentional not part of the __eq__ or __hash__ method - it has no meaning, it is a mere reference-indicator. c1 and c2 are the same component -> both are pip

jkowalleck avatar Sep 11 '24 10:09 jkowalleck

Fair point but when there are multiple BOM files sourced in the python script, this is no longer applicable. When merging sets of Vulnerability objects from separate BOM files, they refer to their own Component. If only one of these is added to the global list of Components, all references will fail since they assume they can use BomRef.

For example, in this case for already loaded bom_files 1 and 2:

for loaded_bom in [bom_file1, bom_file2]:
  root_bom.components |= loaded_bom.components
  root_bom.dependencies |= loaded_bom.dependencies

The set difference loaded_bom.dependencies - loaded_bom.components is empty, while root_bom.dependencies - root_bom.components may not be (due to deduplication based on values without BomRef). The validate call on the root_bom will then throw a UnknownComponentDependencyException as described in the issue.

Is the BomRef excluded from the hash for optimization? It seems to work different in cyclonedx-dotnet-library. The merge option in cyclonedx-cli does not seem to do this, allowing for duplicate components instead.

wkoot avatar Sep 11 '24 10:09 wkoot

Fair point but when there are multiple BOM files sourced in the python script, this is no longer applicable. When merging sets of Vulnerability objects from separate BOM files, they refer to their own Component. If only one of these is added to the global list of Components, all references will fail since they assume they can use BomRef.

For example, in this case for already loaded bom_files 1 and 2:

for loaded_bom in [bom_file1, bom_file2]:
  root_bom.components |= loaded_bom.components
  root_bom.dependencies |= loaded_bom.dependencies

The set difference loaded_bom.dependencies - loaded_bom.components is empty, while root_bom.dependencies - root_bom.components may not be (due to deduplication based on values without BomRef). The validate call on the root_bom will then throw a UnknownComponentDependencyException as described in the issue.

then the python script you wrote needs top take care of this.

jkowalleck avatar Sep 11 '24 10:09 jkowalleck

bom-ref is intentional not part of the __eq__ or __hash__ method - it has no meaning, it is a mere reference-indicator. c1 and c2 are the same component -> both are pip

I have submitted https://github.com/CycloneDX/cyclonedx-cli/issues/399 to address the discrepancy between the implementation in python and c# libraries. It is however unclear to me as to where the intended behaviour is specified. Do you know if it is somewhere in the spec? There's nothing mentioned in the CycloneDX JSON Reference, although that's arguably the wrong location to explain deduplication.

Note also that the MergeTests for the C# library explicitly do not take bom-ref of components into account. For that matter, this also isn't tested in neither test_component.py nor test_model_component.py. NB - I'm using v1.5 of the spec because the C# library (and therefore cyclonedx-cli) does not yet support v1.6.

wkoot avatar Sep 11 '24 11:09 wkoot

bom-ref is intentional not part of the __eq__ or __hash__ method - it has no meaning, it is a mere reference-indicator. c1 and c2 are the same component -> both are pip

I was wondering if duplicate components in nested objects are supposed to be deduplicated, or if they should not be. Currently, the equality check is only called in set addition - so not in any parent component within the tree. See also https://github.com/aquasecurity/trivy/discussions/7532#discussioncomment-10677327; should nesting these components be a solution? However, if nesting would be an allowed solution - wouldn't that also mean vulnerabilities have to be duplicated?

wkoot avatar Sep 18 '24 07:09 wkoot

bom-ref is intentional not part of the __eq__ or __hash__ method - it has no meaning, it is a mere reference-indicator. c1 and c2 are the same component -> both are pip

I was wondering if duplicate components in nested objects are supposed to be deduplicated, or if they should not be. Currently, the equality check is only called in set addition - so not in any parent component within the tree. See also aquasecurity/trivy#7532 (comment); should nesting these components be a solution? However, if nesting would be an allowed solution - wouldn't that also mean vulnerabilities have to be duplicated?

not intended for nested of parallels, as they may contextualize the individual outer component.

jkowalleck avatar Sep 18 '24 13:09 jkowalleck

It seems that deserialization silently loads broken bom data, in the case of duplicate components as earlier referred to.

EDIT: Created separate issue - https://github.com/CycloneDX/cyclonedx-python-lib/issues/677

wkoot avatar Sep 18 '24 17:09 wkoot

I'm not sure if this still belongs in the current issue, or if I should open a separate ticket with these findings.

please open a dedicated issue for that.

jkowalleck avatar Sep 18 '24 19:09 jkowalleck