ort icon indicating copy to clipboard operation
ort copied to clipboard

fix(SpdxExpression): Fix the DNF algorithm

Open MarcelBochtler opened this issue 2 years ago • 6 comments

Whenever a SpdxExpression contains an AND operator, the DNF needs to be calculated again. For instance the correct DNF for a AND (b OR c OR d) is (a AND b) OR (a AND c) OR (a AND d). Before, the calculated DNF for this expression was (a AND (b OR c)) OR (a AND d) which is not a valid DNF.

MarcelBochtler avatar Mar 20 '23 12:03 MarcelBochtler

I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

Results on my machine

Current main: 13 seconds With fix: 1 min 14 seconds.

Therefore, the changes are currently not feasible :/

MarcelBochtler avatar Mar 20 '23 12:03 MarcelBochtler

Codecov Report

Patch and project coverage have no change.

Comparison is base (8a48a6d) 64.82% compared to head (58e0e6b) 64.82%.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #6721   +/-   ##
=========================================
  Coverage     64.82%   64.82%           
  Complexity     1938     1938           
=========================================
  Files           320      320           
  Lines         16024    16024           
  Branches       2281     2281           
=========================================
  Hits          10388    10388           
  Misses         4657     4657           
  Partials        979      979           
Flag Coverage Δ
funTest-non-docker 49.53% <ø> (ø)
test 31.06% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Mar 20 '23 12:03 codecov[bot]

Extracting the distributeAndOverOr function and only calling this recursively results in ~45 seconds evaluation time on my machine. I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).

MarcelBochtler avatar Mar 21 '23 08:03 MarcelBochtler

Extracting the distributeAndOverOr function and only calling this recursively results in ~45 seconds evaluation time on my machine. I doubt that this problem can be solved in a non-exponential time complexity (in the worst case).

I guess you are right because according to Wikipedia the problem is np-complete: https://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm#Complexity (Cline-McCluskey calculates the minimal DNF which is not strictly required for our use case, but calculating a non-minimal DNF is probably not much less complex)

I see basically two options how we can proceed:

  • Check if implementing another algorithm like Quine-McCluskey provides significantly better performance.
  • Check why the algorithm fix affects performance so much: Do we calculate the DNF too often? Could we maybe cache the result? Do we create too many objects during the calculation? Is there one specific very long expression that takes so long or does the problem also exist for many simple expressions?

mnonnenmacher avatar Mar 21 '23 13:03 mnonnenmacher

I wonder whether disjunctiveNormalForm() is still needed after all... in https://github.com/oss-review-toolkit/ort/pull/8106 it seems we could simply remove it.

sschuberth avatar Jan 11 '24 22:01 sschuberth

I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

@MarcelBochtler what operation exactly did you perform on the result for benchmarking?

sschuberth avatar Jan 17 '24 09:01 sschuberth

I'm using this scanner result to check the performance impact of this change: https://github.com/fviernau/orthw-demo/blob/main/scan-results/scan-result.json.xz

@MarcelBochtler what operation exactly did you perform on the result for benchmarking?

I simply ran the evaluator using the linked scan-result.json:

ort evaluate -i scan-result.json -o ort/ --license-classifications-file license-classifications.yml --rules-file evaluator.rules.kts

I cannot share the license-classifications and the evaluator rules. In the evaluator.rules.kts we do use expression.validChoices() on every expression that offers a choice. Other than that, I don't see anything that could be related to this issue.

I wonder whether disjunctiveNormalForm() is still needed after all... in #8106 it seems we could simplify remove it.

Yes looks like it. Should I close this PR?

MarcelBochtler avatar Mar 25 '24 08:03 MarcelBochtler

Should I close this PR?

For the sake of cleaning things up a little, I'd say yes. I believe in the context of the current code base there now nothing anymore in this PR that we could take / reuse.

sschuberth avatar Mar 25 '24 20:03 sschuberth