cdxgen icon indicating copy to clipboard operation
cdxgen copied to clipboard

Sanitize externalReferences URLs and exclude invalid ones + Strict JSON Schema validation

Open marob opened this issue 1 year ago • 13 comments

The goal of this PR is to fix issue #1107

It should also replace https://github.com/CycloneDX/cdxgen/pull/1128

marob avatar May 30 '24 17:05 marob

Looks very neat! Thank you so much!

@timmyteo @setchy Could you test this locally with some repos and share your thoughts?

prabhu avatar May 30 '24 17:05 prabhu

Great work @marob

Tested this branch against some of my repos. BOM looks good. Uploaded to DTrack 4.11.1 with BOM Validation enabled without issues

One observation: we may need to dedupe externalReferences after sanitization

for example

"externalReferences": [
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects"
    },
    {
        "type": "vcs",
        "url": "[email protected]:follow-redirects/follow-redirects.git"
    }
],

became

"externalReferences": [
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects.git"
    },
    {
        "type": "vcs",
        "url": "https://github.com/follow-redirects/follow-redirects.git"
    }
],

setchy avatar May 30 '24 18:05 setchy

Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.

prabhu avatar May 31 '24 09:05 prabhu

Any ideas about the test failures? We may have to run it locally to troubleshoot, since the validation errors are not shown.

I have no idea what those tests are doing. On all the ones that fail, only this one is showing some details on the reason why: https://github.com/CycloneDX/cdxgen/actions/runs/9306982345/job/25617330208?pr=1130 It looks like it's linked to strict JSON Schema validation on iri-reference.

marob avatar May 31 '24 13:05 marob

'scm:svn:https://svn.codehaus.org/jettison/tags/jettison-1.3.7' - could try my idea of using the url parse instead of regex? Or check if the value starts with http?

prabhu avatar May 31 '24 17:05 prabhu

Can I come up with another branch that doesn't use additional dependencies? If we have more test cases, that will help.

prabhu avatar Jun 01 '24 12:06 prabhu

@marob Do you have time to look at the pending comments?

prabhu avatar Jun 03 '24 14:06 prabhu

@marob Do you have time to look at the pending comments?

@prabhu Probably not this week.

marob avatar Jun 03 '24 15:06 marob

Blocked by: #1134

prabhu avatar Jun 03 '24 20:06 prabhu

We now have an implementation with validation alone that is merged in master. It currently logs the problematic urls and the then filters them. Let's work on the sanitize feature in a separate branch.

prabhu avatar Jun 06 '24 07:06 prabhu

@marob can you create a new branch to implement the sanitize feature for urls?

prabhu avatar Jun 11 '24 17:06 prabhu

@prabhu Should we create a branch to sanitize externalReference URLs? Do we need that feature?

marob avatar Sep 02 '24 14:09 marob

@marob definitely feel free to work on a new branch, although this is a lower priority atm due to lack of any requests from the community.

prabhu avatar Sep 02 '24 19:09 prabhu