licenseclassifier icon indicating copy to clipboard operation
licenseclassifier copied to clipboard

Bug in the computeQ function - v2 classifier

Open bharat-biradar opened this issue 3 years ago • 0 comments

Describe the issue In the computeQ when the threshold is set to 1.0 the granularity is being calculated as 10, but if we set the threshold to 0.95, 0.99, or 0.999 the granularity is being calculated as 19, 99, 999, respectively where there is exponential growth and also the granularity is greater than the granularity set at maxThresold(1.0) which is 10.

Is this intentional?

A problem occurring due to this issue is that when we set the threshold to 0.95 or greater a lot of licenses are not being detected which in the case we set to 0.9 are easily being detected.

I ran the program for around 17,300 license files out of which around 2950 BSD-3-Clause, 850 BSD-2-Clause and some other licenses were not at all detected which were otherwise detected at a granularity of 10 because at that threshold the granularity is greater than 20 and nearly reaches 100.

A possible solution would be to set the granularity to 10 for a threshold greater than 0.9 and it will also handle the divide by zero cases.

bharat-biradar avatar Aug 20 '21 06:08 bharat-biradar