ort
ort copied to clipboard
fix(SpdxExpression): Fix the DNF algorithm
Whenever a SpdxExpression contains an AND operator, the DNF needs to be calculated again.
For instance the correct DNF for a AND (b OR c OR d) is (a AND b) OR (a AND c) OR (a AND d). Before, the calculated DNF for this expression was (a AND (b OR c)) OR (a AND d) which is not a valid DNF.
I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz
Results on my machine
Current main: 13 seconds With fix: 1 min 14 seconds.
Therefore, the changes are currently not feasible :/
Codecov Report
Patch and project coverage have no change.
Comparison is base (
8a48a6d) 64.82% compared to head (58e0e6b) 64.82%.
Additional details and impacted files
@@ Coverage Diff @@
## main #6721 +/- ##
=========================================
Coverage 64.82% 64.82%
Complexity 1938 1938
=========================================
Files 320 320
Lines 16024 16024
Branches 2281 2281
=========================================
Hits 10388 10388
Misses 4657 4657
Partials 979 979
| Flag | Coverage Δ | |
|---|---|---|
| funTest-non-docker | 49.53% <ø> (ø) |
|
| test | 31.06% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Extracting the distributeAndOverOr function and only calling this recursively results in ~45 seconds evaluation time on my machine.
I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).
Extracting the
distributeAndOverOrfunction and only calling this recursively results in ~45 seconds evaluation time on my machine. I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).
I guess you are right because according to Wikipedia the problem is np-complete: https://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm#Complexity (Cline-McCluskey calculates the minimal DNF which is not strictly required for our use case, but calculating a non-minimal DNF is probably not much less complex)
I see basically two options how we can proceed:
- Check if implementing another algorithm like Quine-McCluskey provides significantly better performance.
- Check why the algorithm fix affects performance so much: Do we calculate the DNF too often? Could we maybe cache the result? Do we create too many objects during the calculation? Is there one specific very long expression that takes so long or does the problem also exist for many simple expressions?
I wonder whether disjunctiveNormalForm() is still needed after all... in https://github.com/oss-review-toolkit/ort/pull/8106 it seems we could simply remove it.
I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz
@MarcelBochtler what operation exactly did you perform on the result for benchmarking?
I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz
@MarcelBochtler what operation exactly did you perform on the result for benchmarking?
I simply ran the evaluator using the linked scan-result.json:
ort evaluate -i scan-result.json -o ort/ --license-classifications-file license-classifications.yml --rules-file evaluator.rules.kts
I cannot share the license-classifications and the evaluator rules.
In the evaluator.rules.kts we do use expression.validChoices() on every expression that offers a choice. Other than that, I don't see anything that could be related to this issue.
I wonder whether
disjunctiveNormalForm()is still needed after all... in #8106 it seems we could simplify remove it.
Yes looks like it. Should I close this PR?
Should I close this PR?
For the sake of cleaning things up a little, I'd say yes. I believe in the context of the current code base there now nothing anymore in this PR that we could take / reuse.