enso
enso copied to clipboard
Improve context-detection heuristic for copyright notices in the license review tool
Our license tool is detecting context for their entries, so that for
/**
* Copyright 2001 Foo
*
* Blah Blah
*/
not only the line * Copyright 2001 Foo
, but the full notice is selected.
However we tend to encounter lines that for the same 'main' line (the line that lead to detecting the copyright mention), we find different contexts.
For example:
Currently, if the same line maps to multiple 'contexts' the option to keep it together with the context is disabled (because it is ambiguous). It is generally confusing for developers when to use Keep
VS Keep with context
, we should simplify the process to be just a binary decision: Keep or Ignore (mark as false positive).
To achieve that, we need to improve the heuristic:
- [ ] ideally it should match the full extent of the comment (as can be seen on the screenshot the first and last entries are trimmed)
- [ ] context uniqueness should be less strict - a comment starting with
#
versus//
versus/*
containing the same plain text content should be treated as the same - we should trim whitespace and characters like#=-/*
before comparing. - after such de-duplication, we should then switch so that
Keep
just keeps all differing contexts- [ ] merge
copyright-keep
withcopyright-keep-with-context
- [ ] merge