scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Missing License/Question : MIT-0 not detected and OR license expression misinterpreted as AND

Open JustinWonjaePark opened this issue 3 months ago • 6 comments

Description

When scanning constant_time_eq v0.3.1, I observed the following issues:

  1. MIT-0 not detected

    • In Cargo.toml, the declared license expression is:
      license = "CC0-1.0 OR MIT-0 OR Apache-2.0"
      
    • However, only CC0-1.0 and Apache-2.0 are detected.
    • MIT-0 is missing from the scan result.
  2. OR misinterpreted as AND

    • The above license expression (CC0-1.0 OR MIT-0 OR Apache-2.0) is reported as CC0-1.0 AND Apache-2.0.
    • This changes the intended meaning from a license choice (OR) to a license conjunction (AND).

How To Reproduce

git clone --branch 0.3.1 https://github.com/cesarb/constant_time_eq.git cd constant_time_eq scancode -cli --json-pp - > result.json constant_time_eq

System configuration

OS: macOS 15.6.1 (x86_64) ScanCode Toolkit version: 32.4.1 Installation method: pip

Questions

In addition to the bug report, I would like to confirm two points about how license expressions are represented in the scan output:

  1. Multiple detections per file
  • If files[].license_detections[].matches[].license_expression contains multiple entries for a single file, are they always combined into files[].license_detections[].matches[].detected_license_expression with an AND operator?
  • Or can they sometimes be combined differently (e.g., OR)?
  1. Difference between fields What is the exact difference between:
  • files[].license_detections[].matches[].detected_license_expression
  • files[].license_detections[].matches[].detected_license_expression_spdx
  • When should each be used?

JustinWonjaePark avatar Sep 29 '25 08:09 JustinWonjaePark

@JustinWonjaePark Thank you for the report! We will get back to you with details.

pombredanne avatar Sep 29 '25 10:09 pombredanne

@JustinWonjaePark from a quick look, you may want to use the --package option that is aware of Cargo.toml files

  • A quick test (and an unrelated bug) in https://github.com/aboutcode-org/scancode-toolkit/issues/4581 shows that the detection seems to work fine there.

pombredanne avatar Sep 29 '25 10:09 pombredanne

MIT-0 detection and OR vs AND issue:

I can confirm the problem you've identified. The license expression from Cargo.toml should be preserved as CC0-1.0 OR MIT-0 OR Apache-2.0, but it's being incorrectly converted to CC0-1.0 AND Apache-2.0 with MIT-0 missing entirely.

As @pombredanne mentioned, using the --package option should help, as it's specifically designed to handle package manifest files like Cargo.toml. Could you try:

scancode --package --json-pp result.json constant_time_eq

This should correctly parse the license declaration from the Cargo metadata.

karthiknew07 avatar Sep 29 '25 12:09 karthiknew07

@pombredanne, @karthiknew07 Thanks for the quick reply and clear explanation! I actually tried scanning with the --package option and confirmed that it retrieves the correct license information for the package as you mentioned. However, in my case, I’ve been using the ScanCode package within FOSSLight Source Scanner(https://github.com/fosslight/fosslight_source_scanner) to get license information for each individual file being scanned. That’s why I’ve been focusing on the results in files[].license_detections[].matches[].license_expression. Any help would be greatly appreciated!

JustinWonjaePark avatar Sep 30 '25 00:09 JustinWonjaePark

Thank you for the clarification! I understand now - you need accurate per-file license detection, not just package-level metadata. The issue you're experiencing is indeed a bug in how ScanCode is processing the license information from Cargo.toml at the file level. The Cargo.toml file itself should show the license expression as declared (CC0-1.0 OR MIT-0 OR Apache-2.0), but it's being incorrectly parsed and combined.

I'm investigating why:

  1. MIT-0 is not being detected - This could be a missing license key or detection rule issue
  2. OR is being converted to AND - This is incorrect behavior when parsing declared license expressions from manifest files

For your use case with FOSSLight Source Scanner, which relies on per-file license detection results, this is definitely something we need to fix properly. I'll prioritize looking into:

  • The license detection rules for MIT-0
  • The logic that parses and combines license expressions from Cargo.toml
  • Ensuring the OR operators are preserved correctly in the file-level results

I'll keep you updated on the progress and let you know once there's a fix available. In the meantime, if you find any workarounds while using FOSSLight Source Scanner, please share them as they might help others with similar setups.

karthiknew07 avatar Sep 30 '25 12:09 karthiknew07

Actually, my team and I have been discussing possible workarounds for this issue, but we haven’t found any good ones yet. One rough idea was to use the matched_text to check if it consists only of an SPDX License Expression, but that seems time-consuming and wouldn’t really solve the problem. I'll keep you posted if anything comes up.

JustinWonjaePark avatar Sep 30 '25 23:09 JustinWonjaePark