code-intelligence icon indicating copy to clipboard operation
code-intelligence copied to clipboard

Add one more step to select labels which can be predicted

Open abcdefgs0324 opened this issue 5 years ago • 1 comments

For now, we choose labels which satisfy both of the precision and recall thresholds (e.g., 0.7 and 0.5 as default respectively) to be able to be predicted. It may cause a small label coverage depending on different repositories.

An option is that we can choose labels by two steps

  1. Choose labels which satisfy both of the precision and recall thresholds. (the current method)
  2. For remaining labels, also pick up those which can meet the precision threshold. (a new step)

The reason for why we do the second step is that maintainers may care more about the false positive. Therefore, it is possible to include all labels which meet the precision threshold even though they may be predicted seldom.

However, there is the trade-off between precision and recall. For the second step, if we maximize the precision, it is likely to minimize the recall. In my opinion, we may need to choose probability thresholds for labels by letting their precision to be higher than but close to the precision threshold because the threshold is seen as the minimum acceptable value. And, only those labels satisfying the precision threshold can be included to be predicted.

abcdefgs0324 avatar Sep 13 '19 20:09 abcdefgs0324

Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.85. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Sep 13 '19 20:09 issue-label-bot[bot]