django-DefectDojo
django-DefectDojo copied to clipboard
deduplication is not working
Hey,
I have a problem with deduplication. I use the trivy-dojo-report-operator to import my reports to Defectdojo but I keep getting clones of vulnerabilities that only differ in creation-time and description.
I enabled deduplication in Defectdojo and set the max number of duplicates to 0. I think the issue could be the description-field. It contains our ressource-name which ends with a hash that changes every time we deploy. I already tried to change the deduplication algorithm. However nothing worked for me so far. Is there a workaround?
I looked into the logs of the deployed Defectdojo pods, but didn't see any errors.
Here are the values of one of the findings that have not been recognized as duplicates:
Title CVE-2024-7254 com.google.protobuf:protobuf-java 3.25.4 (same for both)
Productname: Testrun (same for both)
Servicename: Testrun (same for both)
Component Version: 3.25.4 (same for both)
Component Name com.google.protobuf:protobuf-java (same for both)
Vulnerability Ids CVE-2024-7254 (same for both)
Severity: high (same for both)
Description:
protobuf: StackOverflow vulnerability in Protocol Buffers (same for both)
Fixed version: 3.25.5, 4.27.5, 4.28.2 (same for both)
container.name: Testrun (same for both)
resource.kind: ReplicaSet (same for both)
resource.name: Testrun-5b66c55585 (---------------The hash is different between both--------------)
resource.namespace: dev (same for both)
Defect-Dojo-Django Version Docker: 2.42.0-alpine Helm Version: 1.6.183
The dedupe config for trivy operator by default:
"Trivy Operator Scan": ["title", "severity", "vulnerability_ids", "description"],
And recalculating the hash_codes via:
docker compose exec uwsgi /bin/bash -c "python manage.py dedupe.py --parser 'Trivy Operator Scan' --hash_code_only"
Thanks @valentijnscholten, I'm a collegue of phuget. This seems to be working, I actually found this before your reply by reading up different issues on github and looking up linked markdown files. Might I suggest adding this information to the official documentation at the deduplication section here https://docs.defectdojo.com/en/working_with_findings/finding_deduplication/about_deduplication/
We had trouble understanding what parsers do, how they are connected to Tests and how Hashcodes are involved. It was not obvious, that the key of the parsers is connected to the "Test Type". I assumed it was a typo, since spaces in key-value mappings are rare. We configured the HASHCODE_FIELDS_PER_SCANNER value for "Trivy Operator Scan" without the "description" field and regenerated the hash_codes again.
All of this was not mentioned or linked in the documentation linked above.
We found the information we needed in this document and the subsequent chapters: https://github.com/DefectDojo/django-DefectDojo/blob/master/docs/content/en/open_source/archived_docs/usage/features.md#deduplication-algorithms My problem with the location is, that it is part of the "archived_docs" folder where I would assume the information to be outdated.
All in all we spent about 2-3 hours searching up on this.
copying in @paulOsinski
@MPritsch How well does the new hash_code configuration for Trivy Operator Scan work?
Tuning the deduplication settings is a bit of a two edges swords. It's nice to have flexibility, but if all users are changing these settings it becomes hard to provide support. Especially if users don't tell us they changed the settings.
But I do think it's a good idea to document it better, which is what we did in https://github.com/DefectDojo/django-DefectDojo/pull/13464