vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Add malware from https://github.com/ossf/malicious-packages

Open pombredanne opened this issue 1 year ago • 3 comments

https://github.com/ossf/malicious-packages will need special treatment because these packages are yanked. And NVD does not assigns CVEs to these

pombredanne avatar Jan 29 '24 14:01 pombredanne

Hi team, I would like to work on this issue if that's ok

The high level approach to add this importer would be as follows

  1. Clone the Github repo and search for all json files in malacious folder using glob
  2. Parse the json files. Since the json files follow osv format we can directly use the OSV importer
  3. Yield AdvisoryData

A code snippet that implements this without using classes would be as follows:

from vulnerabilities.importers.osv import parse_advisory_data


license_url = "https://github.com/ossf/malicious-packages/blob/main/LICENSE"
spdx_license_expression = "CC-BY-4.0"
url = "git+https://github.com/ossf/malicious-packages"
importer_name = "OpenSSF Malacious Packages Importer"


def advisory_data():
    supported_ecosystem = "npm"

    vcs_response = clone(repo_url=url)
    base_path = Path(vcs_response.dest_dir)
    path = base_path / "osv" / "malicious" / supported_ecosystem
    for file in Path.glob(path, "**/*.json"):               
            with open(file, "r") as f:
                json_data = json.load(f)                
                advisory_url = get_advisory_url(
                            file=file,
                            base_path=base_path,
                            url="https://github.com/ossf/malicious-packages/blob/main",
                        )
                parse_advisory_data(
                            json_data, supported_ecosystem=supported_ecosystem, advisory_url=advisory_url
                    )
            

I have most of the code ready and would create a PR soon if that works.

shravankshenoy avatar Feb 01 '24 06:02 shravankshenoy

One point for consideration is the osv importer only supports 1 ecosystem at a time whereas this has four ecosystems (crates.io, npm, pypi, rubygems).

In all other importers where we have used the OSV importer such as oss_fuzz.py, pypa.py or pysec.py, there was only 1 supported ecosystem, while this has multiple. The simplest way to approach this would be to create a list and loop through it like in the snippet below

base_path = Path(vcs_response.dest_dir)
supported_ecosystems = ["crates.io", "npm", "pypi", "rubygems"]
for supported_ecosystem in supported_ecosystems:
        path = base_path / "osv" / "malicious" / supported_ecosystem
        ## Rest of code

Is there a better way is something to consider. @pombredanne @TG1999

shravankshenoy avatar Feb 01 '24 06:02 shravankshenoy

I have raised a PR to import data from openssf malicious packages. Let me know if any changes are required. PR #1412

shravankshenoy avatar Feb 05 '24 09:02 shravankshenoy