syft icon indicating copy to clipboard operation
syft copied to clipboard

Unable to extract licenses for some NPM packages

Open atl-mk opened this issue 2 years ago • 7 comments

What happened: I ran Syft with SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true and it logs out a warning it failed to fetch them

What you expected to happen: To successfully fetch all licenses

Steps to reproduce the issue:

  1. Make a project with the dependencies below
  2. Run Syft with SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true

Anything else we need to know?:

  • I ran Syft twice and both times it gave the same warnings, implying there's an invalid assumption in the unmarshalling
  • Here are the warnings from running Syft
[0036]  WARN unable to extract licenses from javascript yarn.lock for package ansi-wrap:0.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0041]  WARN unable to extract licenses from javascript yarn.lock for package array-slice:0.2.3: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0042]  WARN unable to extract licenses from javascript yarn.lock for package array-unique:0.2.1: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0086]  WARN unable to extract licenses from javascript yarn.lock for package config-chain:1.1.13: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0134]  WARN unable to extract licenses from javascript yarn.lock for package glob-base:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0168]  WARN unable to extract licenses from javascript yarn.lock for package is-primitive:2.0.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0172]  WARN unable to extract licenses from javascript yarn.lock for package is-whitespace:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0187]  WARN unable to extract licenses from javascript yarn.lock for package kind-of:1.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0242]  WARN unable to extract licenses from javascript yarn.lock for package preserve:0.2.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string

They all look like they have normal package.json files on GitHub to me which is strange

Environment:

  • Output of syft version: syft 0.103.1
  • OS (e.g: cat /etc/os-release or similar):
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

atl-mk avatar Feb 08 '24 21:02 atl-mk

Hi @atl-mk, thanks for the report! I tried quickly to reproduce on the same version of Syft:

mkdir syft-2611 && cd syft-2611
yarn && yarn add array-slice
SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true syft . -o json

I don't see the warning you are seeing, and I see the MIT license in the JSON output:

      "licenses": [
        {
          "value": "MIT",
          "spdxExpression": "MIT",
          "type": "declared",
          "urls": [],
...

Can you share more detailed reproduction steps, maybe the full project you are scanning? Can you also try upgrading to the latest available Syft?

Thanks!

tgerla avatar Feb 15 '24 17:02 tgerla

@tgerla

Exactly the same even in a different project

Here's a simple package.json file

{
  "name": "test",
  "private": true,
  "dependencies": {
    "ansi-wrap": "0.1.0",
    "array-slice": "0.1.0",
    "glob-base": "0.3.0",
    "is-primitive": "2.0.0",
    "is-whitespace": "0.3.0",
    "kind-of": "1.1.0",
    "preserve": "0.2.0"
  }
}

Simply using yarn on 1.22.19 results in the same output. I even upgraded to the latest version of Syft, when I reported the bug 1.103.1 was the latest version, but the bug is still present

$ SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true SYFT_LOG_LEVEL=info syft . -o syft-json=sbom.cyclonedx.json
[0000]  INFO syft version: 0.103.1
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
[0000]  WARN unable to extract licenses from javascript yarn.lock for package ansi-wrap:0.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0001]  WARN unable to extract licenses from javascript yarn.lock for package example:8.8.8: unable to parse license from npm registry: json: cannot unmarshal string into Go value of type struct { License string "json:\"license\"" }
[0001]  WARN unable to extract licenses from javascript yarn.lock for package glob-base:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-primitive:2.0.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-whitespace:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package kind-of:1.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package preserve:0.2.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
$ SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true SYFT_LOG_LEVEL=info syft . -o syft-json=sbom.cyclonedx.json
[0000]  INFO syft version: 0.105.0
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
[0000]  WARN unable to extract licenses from javascript yarn.lock for package ansi-wrap:0.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0001]  WARN unable to extract licenses from javascript yarn.lock for package example:8.8.8: unable to parse license from npm registry: json: cannot unmarshal string into Go value of type struct { License string "json:\"license\"" }
[0001]  WARN unable to extract licenses from javascript yarn.lock for package glob-base:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-primitive:2.0.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-whitespace:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package kind-of:1.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package preserve:0.2.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
$ yarn -v
1.22.19

The yarn.lock file is simple too

# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
# yarn lockfile v1


[email protected]:
  version "0.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/ansi-wrap/-/ansi-wrap-0.1.0.tgz#a82250ddb0015e9a27ca82e82ea603bbfa45efaf"
  integrity sha512-ZyznvL8k/FZeQHr2T6LzcJ/+vBApDnMNZvfVFy3At0knswWd6rJ3/0Hhmpu8oqa6C92npmozs890sX9Dl6q+Qw==

[email protected]:
  version "0.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/array-slice/-/array-slice-0.1.0.tgz#12adfc0238fc6a29e6ab5a4b7789c6ce7b723dc6"
  integrity sha512-hC286ytySez3XJWkjsBjugydgPZJXiHvwZNegJUIs+Xs5Ovslm7UfAlijFjYq7rJP4aUGdCF9FfWy7lPd1m4/A==

[email protected]:
  version "0.3.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/glob-base/-/glob-base-0.3.0.tgz#dbb164f6221b1c0b1ccf82aea328b497df0ea3c4"
  integrity sha512-ab1S1g1EbO7YzauaJLkgLp7DZVAqj9M/dvKlTt8DkXA2tiOIcSMrlVI2J1RZyB5iJVccEscjGn+kpOG9788MHA==
  dependencies:
    glob-parent "^2.0.0"
    is-glob "^2.0.0"

glob-parent@^2.0.0:
  version "2.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/glob-parent/-/glob-parent-2.0.0.tgz#81383d72db054fcccf5336daa902f182f6edbb28"
  integrity sha512-JDYOvfxio/t42HKdxkAYaCiBN7oYiuxykOxKxdaUW5Qn0zaYN3gRQWolrwdnf0shM9/EP0ebuuTmyoXNr1cC5w==
  dependencies:
    is-glob "^2.0.0"

is-extglob@^1.0.0:
  version "1.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-extglob/-/is-extglob-1.0.0.tgz#ac468177c4943405a092fc8f29760c6ffc6206c0"
  integrity sha512-7Q+VbVafe6x2T+Tu6NcOf6sRklazEPmBoB3IWk3WdGZM2iGUwU/Oe3Wtq5lSEkDTTlpp8yx+5t4pzO/i9Ty1ww==

is-glob@^2.0.0:
  version "2.0.1"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-glob/-/is-glob-2.0.1.tgz#d096f926a3ded5600f3fdfd91198cb0888c2d863"
  integrity sha512-a1dBeB19NXsf/E0+FHqkagizel/LQw2DjSQpvQrj3zT+jYPpaUCryPnrQajXKFLCMuf4I6FhRpaGtw4lPrG6Eg==
  dependencies:
    is-extglob "^1.0.0"

[email protected]:
  version "2.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-primitive/-/is-primitive-2.0.0.tgz#207bab91638499c07b2adf240a41a87210034575"
  integrity sha512-N3w1tFaRfk3UrPfqeRyD+GYDASU3W5VinKhlORy8EWVf/sIdDL9GAcew85XmktCfH+ngG7SRXEVDoO18WMdB/Q==

[email protected]:
  version "0.3.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-whitespace/-/is-whitespace-0.3.0.tgz#1639ecb1be036aec69a54cbb401cfbed7114ab7f"
  integrity sha512-RydPhl4S6JwAyj0JJjshWJEFG6hNye3pZFBRZaTUfZFwGHxzppNaNOVgQuS/E/SlhrApuMXrpnK1EEIXfdo3Dg==

[email protected]:
  version "1.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/kind-of/-/kind-of-1.1.0.tgz#140a3d2d41a36d2efcfa9377b62c24f8495a5c44"
  integrity sha512-aUH6ElPnMGon2/YkxRIigV32MOpTVcoXQ1Oo8aYn40s+sJ3j+0gFZsT8HKDcxNy7Fi9zuquWtGaGAahOdv5p/g==

[email protected]:
  version "0.2.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/preserve/-/preserve-0.2.0.tgz#815ed1f6ebc65926f865b310c0713bcb3315ce4b"
  integrity sha512-s/46sYeylUfHNjI+sA/78FAHlmIuKqI9wNnzEOGehAlUUYeObv5C2mOinXBjyUyWmJ2SfcS2/ydApH4hTF4WXQ==

atl-mk avatar Feb 16 '24 12:02 atl-mk

Hi @atl-mk, thanks for the detailed info! I've been able to reproduce the issue and have an idea for the fix, and will add this to our backlog. Details below:

It looks like the NPM registry doesn't always return a license shaped the way we expect. In Syft's code, we assume that the license field on the returned object will be a string, but it looks like sometimes it can be an object:

❯ curl -s https://registry.npmjs.org/tiny-tarball/1.0.0 | jq .license
"ISC"

❯ curl -s https://registry.npmjs.org/ansi-wrap/0.1.0 | jq .license
{
  "type": "MIT",
  "url": "https://github.com/jonschlinkert/ansi-wrap/blob/master/LICENSE"
}

So for ansi-wrap, we get an object back, and for tiny-tarball, we get a single string.

But in Syft, we assume it will be a single string, see https://github.com/anchore/syft/blob/98de2e2f6205b1660f98915cbed22695821fa9c8/syft/pkg/cataloger/javascript/package.go#L186-L188, so this functionality is broken for packages that have an object in their license field.

Dev notes: The next step is to change our deserialization to work with either an object or a string being returned in the license field.

willmurphyscode avatar Feb 27 '24 19:02 willmurphyscode

Thanks @willmurphyscode

Note it can also be an array of objects. See https://docs.npmjs.com/cli/v10/configuring-npm/package-json#license for an example, while this shape is deprecated by NPM, many packages still use this.

atl-mk avatar Feb 28 '24 15:02 atl-mk