enso icon indicating copy to clipboard operation
enso copied to clipboard

Improve context-detection heuristic for copyright notices in the license review tool

Open radeusgd opened this issue 9 months ago • 0 comments

Our license tool is detecting context for their entries, so that for

/**
  * Copyright 2001 Foo
  *
  * Blah Blah
  */

not only the line * Copyright 2001 Foo, but the full notice is selected.

However we tend to encounter lines that for the same 'main' line (the line that lead to detecting the copyright mention), we find different contexts.

For example: image

Currently, if the same line maps to multiple 'contexts' the option to keep it together with the context is disabled (because it is ambiguous). It is generally confusing for developers when to use Keep VS Keep with context, we should simplify the process to be just a binary decision: Keep or Ignore (mark as false positive).

To achieve that, we need to improve the heuristic:

  • [ ] ideally it should match the full extent of the comment (as can be seen on the screenshot the first and last entries are trimmed)
  • [ ] context uniqueness should be less strict - a comment starting with # versus // versus /* containing the same plain text content should be treated as the same - we should trim whitespace and characters like #=-/* before comparing.
  • after such de-duplication, we should then switch so that Keep just keeps all differing contexts
    • [ ] merge copyright-keep with copyright-keep-with-context

radeusgd avatar Apr 25 '24 14:04 radeusgd