scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

distro is passed as None for RPM packages

Open TG1999 opened this issue 1 year ago • 8 comments

Description

For debian based packages we pass namespace=distro here https://github.com/nexB/scancode-toolkit/blob/8ed266372416a4e55cf739dadafa175214dca980/src/packagedcode/debian.py#L634 but never passes distro apart from the case where it's distroless.

How To Reproduce

Look at the issue details

TG1999 avatar Jun 30 '23 16:06 TG1999

Hey @TG1999 , did you mean handle "None" value for 'distro' or set a default value to distro?

ai-naymul avatar Aug 09 '23 17:08 ai-naymul

See also Input PURL does not match output PURL #1274, just added a few minutes ago.

johnmhoran avatar Aug 17 '23 17:08 johnmhoran

A rough idea from my side . what you think @TG1999 .

xonx4l avatar Aug 30 '23 13:08 xonx4l

Some hints for solving this:

  1. the distro needs to be found first. There may be several ways to do this either based on data available inside a package (a .deb), in its version or name (where there may be a hint that it's from Ubuntu), or in the rootfs for installed packages where we can collect a distro in https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/distro.py

  2. once found, use this as the namespace

  3. if not found, should we have a default? I am not sure, yet using debian may make sense?

pombredanne avatar Sep 01 '23 14:09 pombredanne

Don't we have the same/similar problem for redhat, fedora or centos for type=rpm? We need some well-documented default in all cases

mjherzog avatar Sep 01 '23 16:09 mjherzog

Don't we have the same/similar problem for redhat, fedora or centos for type=rpm? We need some well-documented default in all cases

Yes! The issue exists with RPMs too

pombredanne avatar Sep 01 '23 19:09 pombredanne

This is fixed for debian by the following PRs:

  • https://github.com/nexB/scancode.io/pull/1096
  • https://github.com/nexB/scancode-toolkit/pull/3682 We are detecting namespace from clues found in package attributes, and also providing a default namespace debian if no clues are present. Then on the SCIO side we are using the distro_id to override this too, if there are irregularities there.

We need to do something similar for RPM.

AyanSinhaMahapatra avatar Apr 04 '24 13:04 AyanSinhaMahapatra

For RPMs, the same thing to do is IMHO to use the /etc/os-release "identifier" field ( as in identifier: rhel) for the distro. And use this as a namespace for the RPM PURL.

pombredanne avatar Apr 29 '24 15:04 pombredanne