Combine javascript package and lock catalogers
What happened:
I think Syft's JavaScript cataloger fails to detect and parse integrity properties inside package-lock.json. The cataloger successfully finds packages from node_modules/*/package.json files but ignores the package-lock.json file that contains SHA-512 integrity hashes.
What you expected to happen:
When scanning a Docker image containing a Node.js project with a package-lock.json file (lockfile version 3) that includes integrity hashes, Syft should contain the integrity properties per package and output it in metadata:
Steps to reproduce the issue:
- Create a simple Node.js project with package-lock.json containing integrity hashes:
FROM node:20
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]
// package.json
{
"name": "test-project",
"version": "1.0.0",
"dependencies": {
"rxjs": "^7.8.2"
}
}
// package-lock.json (generated by npm install)
{
"name": "test-project",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"node_modules/rxjs": {
"version": "7.8.2",
"resolved": "https://registry.npmjs.org/rxjs/-/rxjs-7.8.2.tgz",
"integrity": "sha512-dhKf903U/PQZY6boNNtAGdWbG85WAbjT/1xYoZIC7FAY0yWapOBQVsVrDl58W86//e1VpMNBtRV4MaXfdMySFA==",
"dependencies": {
"tslib": "^2.1.0"
}
},
"node_modules/tslib": {
"version": "2.8.1",
"resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz",
"integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w=="
}
}
}
- Run
docker build -t test-image . - Run
syft test-image -o json - Verify the package-lock.json exists
docker run --rm test-image ls -la /app/(shows package-lock.json) - Check package metadata:
jq '.artifacts[] | select(.name == "rxjs") | .metadata' output.json(no integrity field)
Actual Results:
- Syft finds packages from
node_modules/*/package.jsonfiles - Syft does NOT detect
/app/package-lock.jsonin the files list - Package metadata lacks integrity fields:
"metadata": {"name": "rxjs", "version": "7.8.2", ...}(no integrity)
Expected Results:
- Syft should detect
/app/package-lock.jsonin the files list - Package metadata should include integrity:
"metadata": {"name": "rxjs", "version": "7.8.2", "integrity": "sha512-dhKf903U...", ...}
Anything else we need to know?:
- The
package-lock.jsonfile is physically present in the image and readable - Syft's codebase shows
NpmPackageLockEntryandYarnLockEntrystructs withIntegrityfields, indicating this feature was intended to work - Test fixtures in test-fixtures show expected behavior with integrity values in cataloger tests
- Modern npm (v7+) uses SHA-512 hashes by default, while older test fixtures show SHA-1 hashes
Environment:
Application: syft
Version: 1.27.1
BuildDate: 2025-06-11T21:00:55Z
GitCommit: Homebrew
GitDescription: [not provided]
Platform: darwin/arm64
GoVersion: go1.24.4
Compiler: gc
SchemaVersion: 16.0.34
- OS: macOS Darwin 24.5.0 (arm64)
- Docker: Used for testing container images
- Node.js: v20 (in test Docker image)
- npm: Latest (generates lockfileVersion 3 with SHA-512 hashes)
The reason why this wasn't detected is because we look for package.json files for images scans, and package-lock.json files for directory scans:
$ syft cataloger list --select-catalogers javascript
Default selections: 1
• 'all'
Selection expressions: 1
• 'javascript' (intersect)
...
┌──────────────────────────────┬───────────────────────────────────────────────────────────────┐
│ PACKAGE CATALOGER │ TAGS │
├──────────────────────────────┼───────────────────────────────────────────────────────────────┤
│ javascript-lock-cataloger │ declared, directory, javascript, language, node, npm, package │
│ javascript-package-cataloger │ image, installed, javascript, language, node, package │
└──────────────────────────────┴───────────────────────────────────────────────────────────────┘
Why make this distinction? This is because there is guaranteed to be a package.json detected on installed packages (and no guarantee for package-lock.json files) when looking within node_module directories:
$ find . | grep package.json
./package.json # <-- we can't depend on this since it's not from the node_module (can be cleaned up in the Dockerfile)
./node_modules/tslib/modules/package.json
./node_modules/tslib/package.json
./node_modules/rxjs/ajax/package.json
./node_modules/rxjs/fetch/package.json
./node_modules/rxjs/operators/package.json
./node_modules/rxjs/package.json
./node_modules/rxjs/testing/package.json
./node_modules/rxjs/webSocket/package.json
Where as there is no package-lock.json for each package (other than what is copied into the build container environment, which again cannot be depended on to be there).
However, the NPM tooling has begun to copy the package-lock.json into the node modules dir:
find . | grep package-lock.json
./node_modules/.package-lock.json
./package-lock.json # <-- we can't depend on this since it's not from the node_module (can be cleaned up in the Dockerfile)
The package.json has rich information like license, version, url links, etc, whereas the lock has the integrity information. I think there is an opportunity to capture information from both the node_module lock file and the package.jsons in a new combined cataloger.
Today we treat lock and package json metadata as separate, but we could combine these:
// NpmPackage represents the contents of a javascript package.json file.
type NpmPackage struct {
Name string `mapstructure:"name" json:"name"`
Version string `mapstructure:"version" json:"version"`
Author string `mapstructure:"author" json:"author"`
Homepage string `mapstructure:"homepage" json:"homepage"`
Description string `mapstructure:"description" json:"description"`
URL string `mapstructure:"url" json:"url"`
Private bool `mapstructure:"private" json:"private"`
+ LockEntry *NpmPackageLockEntry `mapstructure:"lock" json:"lock"`
}
// NpmPackageLockEntry represents a single entry within the "packages" section of a package-lock.json file.
type NpmPackageLockEntry struct {
Resolved string `mapstructure:"resolved" json:"resolved"`
Integrity string `mapstructure:"integrity" json:"integrity"`
}
// YarnLockEntry represents a single entry section of a yarn.lock file.
type YarnLockEntry struct {
Resolved string `mapstructure:"resolved" json:"resolved"`
Integrity string `mapstructure:"integrity" json:"integrity"`
}
We can add a new custom cataloger that looks for both node_module/.package-lock.json as well as node_module/**/package.json and correlates the findings into a single package graph, so that lock and package entries are a single metadata on the package. This will take a little doing, but we do have precedence for this (e.g. the deps.json and PE binary catalogers for .NET). This new cataloger would be used for both image and directory scans and we would deprecate the existing lock/package catalogers.
We do need to think through what this means for yarm, pnpm, etc (with their metadata structs).
i would like to work on this if it is not assigned