licenseclassifier
licenseclassifier copied to clipboard
Bug in the computeQ function - v2 classifier
Describe the issue
In the computeQ when the threshold is set to 1.0
the granularity is being calculated as 10
, but if we set the threshold to 0.95
, 0.99
, or 0.999
the granularity is being calculated as 19
, 99
, 999
, respectively where there is exponential growth and also the granularity is greater than the granularity set at maxThresold
(1.0) which is 10.
Is this intentional?
A problem occurring due to this issue is that when we set the threshold to 0.95
or greater a lot of licenses are not being detected which in the case we set to 0.9
are easily being detected.
I ran the program for around 17,300
license files out of which around 2950 BSD-3-Clause
, 850 BSD-2-Clause
and some other licenses were not at all detected which were otherwise detected at a granularity of 10
because at that threshold the granularity is greater than 20 and nearly reaches 100.
A possible solution would be to set the granularity to 10
for a threshold greater than 0.9
and it will also handle the divide by zero cases.