Sanitize externalReferences URLs and exclude invalid ones + Strict JSON Schema validation
The goal of this PR is to fix issue #1107
It should also replace https://github.com/CycloneDX/cdxgen/pull/1128
Looks very neat! Thank you so much!
@timmyteo @setchy Could you test this locally with some repos and share your thoughts?
Great work @marob
Tested this branch against some of my repos. BOM looks good. Uploaded to DTrack 4.11.1 with BOM Validation enabled without issues
One observation: we may need to dedupe externalReferences after sanitization
for example
"externalReferences": [
{
"type": "vcs",
"url": "https://github.com/follow-redirects/follow-redirects"
},
{
"type": "vcs",
"url": "[email protected]:follow-redirects/follow-redirects.git"
}
],
became
"externalReferences": [
{
"type": "vcs",
"url": "https://github.com/follow-redirects/follow-redirects.git"
},
{
"type": "vcs",
"url": "https://github.com/follow-redirects/follow-redirects.git"
}
],
Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.
Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.
I have no idea what those tests are doing. On all the ones that fail, only this one is showing some details on the reason why: https://github.com/CycloneDX/cdxgen/actions/runs/9306982345/job/25617330208?pr=1130 It looks like it's linked to strict JSON Schema validation on iri-reference.
'scm:svn:https://svn.codehaus.org/jettison/tags/jettison-1.3.7' - could try my idea of using the url parse instead of regex? Or check if the value starts with http?
Can I come up with another branch that doesn't use additional dependencies? If we have more test cases, that will help.
@marob Do you have time to look at the pending comments?
@marob Do you have time to look at the pending comments?
@prabhu Probably not this week.
Blocked by: #1134
We now have an implementation with validation alone that is merged in master. It currently logs the problematic urls and the then filters them. Let's work on the sanitize feature in a separate branch.
@marob can you create a new branch to implement the sanitize feature for urls?
@prabhu Should we create a branch to sanitize externalReference URLs? Do we need that feature?
@marob definitely feel free to work on a new branch, although this is a lower priority atm due to lack of any requests from the community.