ort
ort copied to clipboard
Snippet Model cannot be mapped from FossID multiline range matching
Currently, for performance reasons (see https://github.com/oss-review-toolkit/ort/issues/7028), the matched lines are not fetched from FossID.
However it seems the current snippet model is not capable of representing certain FossID responses.
For instance, for a given snippet match in FossID, I receive the following answer for the listMatchedLines function (truncated for clarity):
"data": {
"local_file": {
"1": "1",
"2": "2",
"3": "3",
"4": "4",
"5": "5",
"6": "6",
"7": "7",
[...]
"19": "19",
"20": "20",
"21": "21",
"22": "22",
"23": "23",
"24": "24",
"45": "45",
"46": "46",
"47": "47",
"48": "48",
"49": "49",
"50": "50",
"51": "51",
[...]
"86": "86",
"87": "87",
"88": "88",
"89": "89",
"90": "90",
"91": "91",
[...]
"673": "673",
"674": "674",
"675": "675"
local_file
means this is the lines of the source file being matched.
If one "compress" these lines into line ranges, the result is: 1-24
and 45-675
.
Now this information is supposed to go in SnippetFinding.sourceLocation
:
https://github.com/oss-review-toolkit/ort/blob/6f6e91759730ec20dc172b60612d5d76ab35c232/model/src/main/kotlin/SnippetFinding.kt#L31
Unfortunately, TextLocation
in ORT can carry only two integers for the line information.
So what can be done ?
- Split the snippet match in two
SnippetFinding
, to represent the two ranges ? This could get messy for more complicated matches. - Replace
TextLocation
for the snippet finding byTextRangeLocation
, a new class that can specify multiple range of lines ? - Only map the first range to the
TextLocation
and ignore the rest (or store it in another property)?
As a side note, ScanOSS seems to deliver a single range:
"lines": "1-710",
"oss_lines": "1-710",
but this is only an assumption, as I couldn't reproduce more complex cases of matching.