Add support for introduced and fixed commits in AdvisoryData
- Introduce
introduced_by_commitsandfixed_by_commitsfields in our advisory - Update
from_dictandto_dictmethods - Create a
CodePatchDataimporter class - #2022
@ziadhany add description in the PR please!
@ziadhany mostly looks good! Please run the importer once and paste the logs here. Thanks!
I want to see if we are missing on any data in OSV format. And how does the AdvisoryData and ImpactedPackages looks with the new CommitData. Thanks!
@TG1999 This is the log output for the following importers:
- pysec_importer_v2
- pypa_importer_v2
- oss_fuzz_importer_v2
the database query result :
vulnerabilities_advisoryv2 Total rows: 10274
vulnerabilities_impactedpackage_fixed_by_commits Total rows: 4013
vulnerabilities_impactedpackage_affecting_commits Total rows: 3623
vulnerabilities_codecommit Total rows: 3791
@ziadhany
Invalid VersionRange for affected_pkg: {'package': {'name': 'apache-commons-io', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-io'}, 'ranges': [{'type': 'GIT', 'repo': 'https://github.com/apache/commons-io.git', 'events': [{'introduced': '72b1f88fb722def136ce87c9b2bfdd3c9126bb3d'}, {'fixed': 'd3e5bd6de8bc96abbadccea8b934dc038a32e90c'}]}], 'versions': ['commons-io-2.14.0-RC1', 'rel/commons-io-2.14.0'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'introduced_range': 'c511d15294d1a406a177368804014313948e2601:06fde31494c279ad940149e1a3d4944040c73c0d', 'fixed_range': '247c8e7d85a8df293011c7e9c94fd50bb2986fb7:d3e5bd6de8bc96abbadccea8b934dc038a32e90c'}} for OSV id: 'OSV-2023-962': error:InvalidVersion("'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>")
Invalid VersionRange for affected_pkg: {'package': {'name': 'apache-commons-io', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-io'}, 'ranges': [{'type': 'GIT', 'repo': 'https://github.com/apache/commons-io.git', 'events': [{'introduced': '72b1f88fb722def136ce87c9b2bfdd3c9126bb3d'}, {'fixed': 'd3e5bd6de8bc96abbadccea8b934dc038a32e90c'}]}], 'versions': ['commons-io-2.14.0-RC1', 'rel/commons-io-2.14.0'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'introduced_range': 'c511d15294d1a406a177368804014313948e2601:06fde31494c279ad940149e1a3d4944040c73c0d', 'fixed_range': '247c8e7d85a8df293011c7e9c94fd50bb2986fb7:d3e5bd6de8bc96abbadccea8b934dc038a32e90c'}} for OSV id: 'OSV-2023-618': error:InvalidVersion("'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>")
Why are we getting in this logs? The commit data should have been created for this
See all Invalid VersionRange errors. Why these are coming?
{'package': {'name': 'apache-commons-codec', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-codec'}, 'ranges': [{'type': 'GIT', 'repo': 'https://gitbox.apache.org/repos/asf/commons-codec.git', 'events': [{'introduced': '44e4c4d778c3ab87db09c00e9d1c3260fd42dad5'}, {'fixed': '3bf874e2141dc08550c0b330c7a7006f358bb0f0'}]}], 'versions': ['commons-codec-1.16.1-RC1', 'rel/commons-codec-1.16.1'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'fixed_range': '72c40fe6f62410bcaa019dbf2cb570ee4e49b70e:3bf874e2141dc08550c0b330c7a7006f358bb0f0'}} for OSV id: 'OSV-2023-1195': error:InvalidVersion("'commons-codec-1.16.1-RC1' is not a valid <class 'univers.versions.SemverVersion'>")
when we have introduced and fixed events to create code commit data.
I updated the script to handle unsupported packages (especially for OSS-Fuzz). CodeCommit is no longer ignored even if the package is unsupported, and logs are now more meaningful.
This is the updated logs: importers_v2.zip
the database query result : vulnerabilities_advisoryv2 Total rows: 17041 vulnerabilities_impactedpackage_fixed_by_commits Total rows: 7343 vulnerabilities_impactedpackage_affecting_commits Total rows: 6553 vulnerabilities_codecommit Total rows: 6553
Issues related:
- pysec_importer_v2 / pypa_importer_v2:
- https://github.com/aboutcode-org/univers/issues/174
- https://github.com/aboutcode-org/vulnerablecode/issues/2019
- oss_fuzz_importer_v2
Unsupported package type: None in OSV: 'OSV-2021-1227'This means the package type is unknown (e.g., generic, etc.), and there is no PURL associated with it.Invalid VersionRange for affected_pkgIt depends on whether this is a valid version, for example, a semver version or not. example:> SemverVersion('commons-io-2.14.0-RC1')> univers.versions.InvalidVersion: 'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>
ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.json
This log does not tell me a lot, what's the data. Why this is unsupported.
Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")
One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.
ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.jsonThis log does not tell me a lot, what's the data. Why this is unsupported.
@TG1999 We are ignoring GHSA files since we target only PYSEC files. https://github.com/aboutcode-org/vulnerablecode/blob/main/vulnerabilities/pipelines/v2_importers/pysec_importer.py#L54
ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.jsonThis log does not tell me a lot, what's the data. Why this is unsupported.
@TG1999 We are ignoring GHSA files since we target only PYSEC files.
https://github.com/aboutcode-org/vulnerablecode/blob/main/vulnerabilities/pipelines/v2_importers/pysec_importer.py#L54
Then add that to the log as well :)
Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.
We are skipping this since the version range would likely be inconsistent if we processed it. I also created a related issue in univers:
- https://github.com/aboutcode-org/univers/issues/174
I can changes this if needed.
Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.
We are skipping this since the version range would likely be inconsistent if we processed it.
I also created a related issue in univers:
- https://github.com/aboutcode-org/univers/issues/174
I can changes this if needed.
@keshav-space @pombredanne thoughts on this one ?
For PYSEC data we would be using github version range, coz the versions are Semver. And if a version is not parsable that version should be skipped. Not the entire range. Also we should introduce a flag for advisories that were not completely parsed. So in future if our parsing techniques gets better we can delete the incomplete parsed advisory with a new one.
This is the log output for the following importers:
- pysec_importer_v2
- pypa_importer_v2
- oss_fuzz_importer_v2
- github_osv_importer_v2
Failed to extract fixed commits: ValueError('Commit must be a valid a commit_hash.') We need to know the hash here. Do log the hash as well.
Unsupported severity type: {'type': 'CVSS_V4', 'score': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N'} for OSV id: 'PYSEC-2024-154'
Why this score is not supported ?
Unsupported severity type: {'type': 'CVSS_V4', 'score': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N'} for OSV id: 'PYSEC-2024-154'
Why this score is not supported ?
I have a PR for this
- #1974
In the above example af2.affected_version_range shouldn't be none we are completely missing
8.0.0.We should also update osv tests to include case where we test parsed advisory data for introduced and fixed version.
We were ignoring the introduced field (not sure why), but we compute the affected_version_range from the versions field,
ex: https://ossf.github.io/osv-schema/#ruby-vulnerability
We should create an issue for that and implement a fallback.
@ziadhany
We should create an issue for that and implement a fallback.
Fallback is meant to handle unforeseen cases, what we have here is a bug in the core OSV parsing logic, which needs to be fixed.
We should not make changes to the broken parsing logic in this PR, as it will only pile up our technical debt. We should break this PR into two, this PR was only meant to add support for affected and fixed commits in our Advisory data and the corresponding change in our insert advisory function, in line with the model changes introduced in #2007. The second PR should fix the OSV parsing and collect commits.
@ziadhany we either need a test pipeline or an actual code commit collection pipeline to see how we are doing. Also there is nothing in current code that talks about SUPPORTED_CODE_COMMIT types for now.
@TG1999 Okay, I will add the aosp_dataset importer (https://github.com/quarkslab/aosp_dataset) since it has some diversity in vcs_urls.
url2purl currently fails to parse commit hashes correctly:
- https://github.com/package-url/packageurl-python/issues/210
I’ve opened a PR with a fix:
- https://github.com/package-url/packageurl-python/pull/211
[
{url: https://github.com/abc/def, commit_hash: None, patch_text: None},
{url: https://github.com/abc/def, commit_hash: None, patch_text: "+1-2"},
{url: https://github.com/abc/def, commit_hash: "1213",patch_text: None},
{url: https://github.com/abc/def, commit_hash: "1213",patch_text: "+1-2"},
{url: "https://github.com/abc/def/commit/12323", commit_hash: None, patch_text: None},
{url: "https://github.com/abc/def/commit/12323", commit_hash: None, patch_text: "+1-2"},
{url: "https://github.com/abc/def/commit/12323", commit_hash: 12323, patch_text: None},
{url: "https://github.com/abc/def/commit/12323", commit_hash: 12323, patch_text:"+1-2"},
{url: "https://unknown.com/abc/def", commit_hash: None, patch_text: None},
{url: "https://unknown.com/abc/def", commit_hash: None, patch_text: "+1-2"},
{url: "https://unknown.com/abc/def", commit_hash: "113324", patch_text: None},
{url: "https://unknown.com/abc/def", commit_hash: "113324", patch_text:"+1-2"},
{url:"https://unknown.com/abc/def/123434", commit_hash:None, patch_text: None},
{url:"https://unknown.com/abc/def/123434", commit_hash:None, patch_text: "+1-2"},
{url:"https://unknown.com/abc/def/123434", commit_hash:"123434", patch_text: None}
{url:"https://unknown.com/abc/def/123434", commit_hash:"123434", patch_text: "+1-2"}
]
@ziadhany excellent job! One thing I want is to use these test fixtures as mock advisories, I want to see how are we forming ImpactedPackages (base_purl, packagecommitpatches (intorduced/fixed)), ReferencesV2, Pactch. I want to see something end to end. How are we forming/storing advisories and also how are we relating advisories to references, impacted_packages and patches.
Thanks!
@TG1999 I added an end-to-end test. Can you please have a look?
@keshav-space @TG1999 I’ve added the requested changes. Please have a look and let me know if I should update anything.
AospImporterPipeline Logs:
Importing data using aosp_dataset_fix_commits
INFO 2025-12-11 09:52:54.675302 UTC Pipeline [AospImporterPipeline] starting
INFO 2025-12-11 09:52:54.675437 UTC Step [clone] starting
INFO 2025-12-11 09:52:54.675488 UTC Cloning `git+https://github.com/quarkslab/aosp_dataset`
INFO 2025-12-11 09:52:57.279282 UTC Step [clone] completed in 3 seconds
INFO 2025-12-11 09:52:57.279391 UTC Step [collect_and_store_advisories] starting
INFO 2025-12-11 09:52:57.294661 UTC Collecting 1,994 advisories
INFO 2025-12-11 09:52:57.294748 UTC Processing aosp_dataset fix commits.
INFO 2025-12-11 09:52:58.749586 UTC Progress: 10% (200/1994) ETA: 13 seconds
INFO 2025-12-11 09:53:00.234658 UTC Progress: 20% (399/1994) ETA: 12 seconds
INFO 2025-12-11 09:53:01.706514 UTC Progress: 30% (599/1994) ETA: 10 seconds
INFO 2025-12-11 09:53:02.664337 UTC Progress: 40% (798/1994) ETA: 8 seconds
INFO 2025-12-11 09:53:04.016003 UTC Progress: 50% (997/1994) ETA: 7 seconds
INFO 2025-12-11 09:53:05.483900 UTC Progress: 60% (1197/1994) ETA: 5 seconds
INFO 2025-12-11 09:53:06.949434 UTC Progress: 70% (1396/1994) ETA: 4 seconds
INFO 2025-12-11 09:53:08.425321 UTC Progress: 80% (1596/1994) ETA: 3 seconds
INFO 2025-12-11 09:53:09.726271 UTC Progress: 90% (1795/1994) ETA: 1 seconds
INFO 2025-12-11 09:53:11.224522 UTC Successfully collected 1,992 advisories
INFO 2025-12-11 09:53:11.224660 UTC Step [collect_and_store_advisories] completed in 14 seconds
INFO 2025-12-11 09:53:11.224715 UTC Step [clean_downloads] starting
INFO 2025-12-11 09:53:11.224761 UTC Removing cloned repository
INFO 2025-12-11 09:53:11.254953 UTC Step [clean_downloads] completed in 0 seconds
INFO 2025-12-11 09:53:11.255088 UTC Pipeline completed in 17 seconds
Process finished with exit code 0```
@ziadhany LGTM! please rebase and adjust the migrations! great work :raised_hands:
@ziadhany LGTM! please rebase and adjust the migrations! great work 🙌
@TG1999 Done , Please merge 🚀