Missing License/Question : MIT-0 not detected and OR license expression misinterpreted as AND
Description
When scanning constant_time_eq v0.3.1, I observed the following issues:
-
MIT-0 not detected
- In
Cargo.toml, the declared license expression is:license = "CC0-1.0 OR MIT-0 OR Apache-2.0" - However, only
CC0-1.0andApache-2.0are detected. MIT-0is missing from the scan result.
- In
-
OR misinterpreted as AND
- The above license expression (
CC0-1.0 OR MIT-0 OR Apache-2.0) is reported asCC0-1.0 AND Apache-2.0. - This changes the intended meaning from a license choice (OR) to a license conjunction (AND).
- The above license expression (
How To Reproduce
git clone --branch 0.3.1 https://github.com/cesarb/constant_time_eq.git cd constant_time_eq scancode -cli --json-pp - > result.json constant_time_eq
System configuration
OS: macOS 15.6.1 (x86_64) ScanCode Toolkit version: 32.4.1 Installation method: pip
Questions
In addition to the bug report, I would like to confirm two points about how license expressions are represented in the scan output:
- Multiple detections per file
- If files[].license_detections[].matches[].license_expression contains multiple entries for a single file, are they always combined into files[].license_detections[].matches[].detected_license_expression with an AND operator?
- Or can they sometimes be combined differently (e.g., OR)?
- Difference between fields What is the exact difference between:
- files[].license_detections[].matches[].detected_license_expression
- files[].license_detections[].matches[].detected_license_expression_spdx
- When should each be used?
@JustinWonjaePark Thank you for the report! We will get back to you with details.
@JustinWonjaePark from a quick look, you may want to use the --package option that is aware of Cargo.toml files
- A quick test (and an unrelated bug) in https://github.com/aboutcode-org/scancode-toolkit/issues/4581 shows that the detection seems to work fine there.
MIT-0 detection and OR vs AND issue:
I can confirm the problem you've identified. The license expression from Cargo.toml should be preserved as CC0-1.0 OR MIT-0 OR Apache-2.0, but it's being incorrectly converted to CC0-1.0 AND Apache-2.0 with MIT-0 missing entirely.
As @pombredanne mentioned, using the --package option should help, as it's specifically designed to handle package manifest files like Cargo.toml. Could you try:
scancode --package --json-pp result.json constant_time_eq
This should correctly parse the license declaration from the Cargo metadata.
@pombredanne, @karthiknew07 Thanks for the quick reply and clear explanation! I actually tried scanning with the --package option and confirmed that it retrieves the correct license information for the package as you mentioned. However, in my case, I’ve been using the ScanCode package within FOSSLight Source Scanner(https://github.com/fosslight/fosslight_source_scanner) to get license information for each individual file being scanned. That’s why I’ve been focusing on the results in files[].license_detections[].matches[].license_expression. Any help would be greatly appreciated!
Thank you for the clarification! I understand now - you need accurate per-file license detection, not just package-level metadata.
The issue you're experiencing is indeed a bug in how ScanCode is processing the license information from Cargo.toml at the file level. The Cargo.toml file itself should show the license expression as declared (CC0-1.0 OR MIT-0 OR Apache-2.0), but it's being incorrectly parsed and combined.
I'm investigating why:
- MIT-0 is not being detected - This could be a missing license key or detection rule issue
- OR is being converted to AND - This is incorrect behavior when parsing declared license expressions from manifest files
For your use case with FOSSLight Source Scanner, which relies on per-file license detection results, this is definitely something we need to fix properly. I'll prioritize looking into:
- The license detection rules for MIT-0
- The logic that parses and combines license expressions from Cargo.toml
- Ensuring the OR operators are preserved correctly in the file-level results
I'll keep you updated on the progress and let you know once there's a fix available. In the meantime, if you find any workarounds while using FOSSLight Source Scanner, please share them as they might help others with similar setups.
Actually, my team and I have been discussing possible workarounds for this issue, but we haven’t found any good ones yet. One rough idea was to use the matched_text to check if it consists only of an SPDX License Expression, but that seems time-consuming and wouldn’t really solve the problem. I'll keep you posted if anything comes up.