ort Added matchedText variable to Scancode model and model mapper.

I am trying to add the matching texts in the ORT scan-result.json file. I made some changes and included the "matchedText variable" in ORT's model and modelMapper. After making these changes and running the Scanner I am getting the follow error message:


Exception in thread "main" kotlinx.serialization.MissingFieldException: Field 'matchedText' is required for type with serial name 'org.ossreviewtoolkit.plugins.scanners.scancode.FileEntry.Version3', but it was missing
	at kotlinx.serialization.internal.PluginExceptionsKt.throwMissingFieldException(PluginExceptions.kt:20)
	at org.ossreviewtoolkit.plugins.scanners.scancode.FileEntry$Version3.<init>(ScanCodeResultModel.kt:99)
	at org.ossreviewtoolkit.plugins.scanners.scancode.FileEntry$Version3$$serializer.deserialize(ScanCodeResultModel.kt:99)
	at org.ossreviewtoolkit.plugins.scanners.scancode.FileEntry$Version3$$serializer.deserialize(ScanCodeResultModel.kt:99)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:61)
	at kotlinx.serialization.json.internal.AbstractJsonTreeDecoder.decodeSerializableValue(TreeJsonDecoder.kt:52)
	at kotlinx.serialization.json.internal.TreeJsonDecoderKt.readPolymorphicJson(TreeJsonDecoder.kt:33)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:74)
	at kotlinx.serialization.json.internal.AbstractJsonTreeDecoder.decodeSerializableValue(TreeJsonDecoder.kt:52)
	at kotlinx.serialization.internal.TaggedDecoder.decodeSerializableValue(Tagged.kt:207)
	at kotlinx.serialization.internal.TaggedDecoder$decodeSerializableElement$1.invoke(Tagged.kt:279)
	at kotlinx.serialization.internal.TaggedDecoder.tagBlock(Tagged.kt:294)
	at kotlinx.serialization.internal.TaggedDecoder.decodeSerializableElement(Tagged.kt:279)
	at kotlinx.serialization.encoding.CompositeDecoder$DefaultImpls.decodeSerializableElement$default(Decoding.kt:538)
	at kotlinx.serialization.internal.CollectionLikeSerializer.readElement(CollectionSerializers.kt:80)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.readElement$default(CollectionSerializers.kt:51)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.merge(CollectionSerializers.kt:36)
	at kotlinx.serialization.internal.AbstractCollectionSerializer.deserialize(CollectionSerializers.kt:43)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:61)
	at kotlinx.serialization.json.internal.AbstractJsonTreeDecoder.decodeSerializableValue(TreeJsonDecoder.kt:52)
	at kotlinx.serialization.internal.TaggedDecoder.decodeSerializableValue(Tagged.kt:207)
	at kotlinx.serialization.internal.TaggedDecoder$decodeSerializableElement$1.invoke(Tagged.kt:279)
	at kotlinx.serialization.internal.TaggedDecoder.tagBlock(Tagged.kt:294)
	at kotlinx.serialization.internal.TaggedDecoder.decodeSerializableElement(Tagged.kt:279)
	at org.ossreviewtoolkit.plugins.scanners.scancode.ScanCodeResult$$serializer.deserialize(ScanCodeResultModel.kt:35)
	at org.ossreviewtoolkit.plugins.scanners.scancode.ScanCodeResult$$serializer.deserialize(ScanCodeResultModel.kt:35)
	at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:61)
	at kotlinx.serialization.json.internal.AbstractJsonTreeDecoder.decodeSerializableValue(TreeJsonDecoder.kt:52)
	at kotlinx.serialization.json.internal.TreeJsonDecoderKt.readJson(TreeJsonDecoder.kt:25)
	at kotlinx.serialization.json.Json.decodeFromJsonElement(Json.kt:127)
	at org.ossreviewtoolkit.plugins.scanners.scancode.ScanCodeResultParserKt.parseResult(ScanCodeResultParser.kt:85)
	at org.ossreviewtoolkit.plugins.scanners.scancode.ScanCodeResultParserKt.parseResult(ScanCodeResultParser.kt:37)
	at org.ossreviewtoolkit.plugins.scanners.scancode.ScanCode.createSummary(ScanCode.kt:182)
	at org.ossreviewtoolkit.scanner.CommandLinePathScannerWrapper.scanPath(CommandLinePathScannerWrapper.kt:38)
	at org.ossreviewtoolkit.scanner.Scanner.scanPath(Scanner.kt:597)
	at org.ossreviewtoolkit.scanner.Scanner.runPathScanners(Scanner.kt:447)
	at org.ossreviewtoolkit.scanner.Scanner.scan(Scanner.kt:178)
	at org.ossreviewtoolkit.scanner.Scanner$scan$3.invokeSuspend(Scanner.kt)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:280)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at org.ossreviewtoolkit.plugins.commands.scanner.ScannerCommand.runScanners(ScannerCommand.kt:226)
	at org.ossreviewtoolkit.plugins.commands.scanner.ScannerCommand.run(ScannerCommand.kt:139)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:306)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:319)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:40)
	at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:458)
	at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:455)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:475)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:482)
	at org.ossreviewtoolkit.cli.OrtMainKt.main(OrtMain.kt:85)

config.yml

ort:
  licenseFilePatterns:
    licenseFilenames: [
      'copying*',
      'copyright',
      'licence*',
      '*.licence',
      'license*',
      '*.license',
      'unlicence',
      'unlicense'
    ]

    patentFilenames: [
      'patents'
    ]

    rootLicenseFilenames: [
      'readme*'
    ]
    
  # Package curation providers are listed from highest to lowest priority. Technically, they are applied in reverse
  # order: The provider with the highest priority is applied last, so it can overwrite any previously applied curations.
  # https://github.com/oss-review-toolkit/ort/blob/979847bbb5a4558a7f8cbe2a7c5256600da913cb/model/src/main/resources/reference.yml#L39C4-L40C112
  packageCurationProviders:
  - type: DefaultFile
  - type: DefaultDir
  - type: File
    id: SomeCurationsDir
    options:
      path: /Users/iliou/workspace/tools/ort-configuration/curations
      mustExist: true
  - type: ClearlyDefined
    options:
      serverUrl: 'https://api.clearlydefined.io'
      minTotalLicenseScore: 80

  enableRepositoryPackageCurations: true
  enableRepositoryPackageConfigurations: true

  analyzer:
    allowDynamicVersions: true
    skipExcluded: true

  scanner:

    options:
      ScanCode:
        minVersion: '32.0.6'
        maxVersion: '32.0.6'
        commandLine: '--copyright --license --info --strip-root --timeout 300 --license-text'
        # FIXME: At the moment it's not possible to set maxVersion the same as minVersion
        # https://github.com/oss-review-toolkit/ort/issues/4789

    ignorePatterns: [
      '**/*.ort.yml',
      '**/*.spdx.yml',
      '**/*.spdx.yaml',
      '**/*.spdx.json',
      '**/HERE_NOTICE',
      '**/META-INF/DEPENDENCIES',
      '**/META-INF/DEPENDENCIES.txt',
      '**/META-INF/NOTICE',
      '**/META-INF/NOTICE.txt',
      '**/package-lock.json'
    ]

Dec 22 '23 12:12 dimitris-iliou

Can someone assist or guide me on the additional steps required to ensure its functionality? Any help is appreciated!

Dec 22 '23 12:12 dimitris-iliou

MissingFieldException: Field 'matchedText' is required for type with serial name 'org.ossreviewtoolkit.plugins.scanners.scancode.FileEntry.Version3', but it was missing

This means that you made matchedText a mandatory field in Version3 for which the file needs to provide a value, but the file has no value for matchedText. If matchedText is not always available, you need to make it optional by providing a default value for it.

But in any case, it seems to me (I've not looked deeply into it, it's just a gut feeling) that you've added matchedText in too many places.

Dec 22 '23 13:12 sschuberth

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (86be29e) 67.08% compared to head (f79712a) 67.13%. Report is 45 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #8063      +/-   ##
============================================
+ Coverage     67.08%   67.13%   +0.04%     
- Complexity     2054     2056       +2     
============================================
  Files           357      357              
  Lines         17121    17167      +46     
  Branches       2457     2471      +14     
============================================
+ Hits          11486    11525      +39     
- Misses         4613     4618       +5     
- Partials       1022     1024       +2

Flag	Coverage Δ
funTest-docker	`66.25% <ø> (+0.02%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jan 08 '24 14:01 codecov[bot]

Managed to make it work and after testing it scan-result.json and evaluator-result.json include the matchedText value. Could you please test my PR and verify?

Jan 15 '24 14:01 dimitris-iliou

Hi all, I was wondering if adding matchedText to the evaluator results would not blow up the size of the file and potentially cause memory issues in the reporter. What does everybody think about addind these matched texts into a separate file? For example, the file could contain a list of all violations and corresponding matched texts, the violations would have the same format as those in the evaluator result such that these two files can be joined.

Mar 13 '24 14:03 bennati

Hi all, I was wondering if adding matchedText to the evaluator results would not blow up the size of the file and potentially cause memory issues in the reporter. What does everybody think about addind these matched texts into a separate file? For example, the file could contain a list of all violations and corresponding matched texts, the violations would have the same format as those in the evaluator result such that these two files can be joined.

I agree. I'd be ok with adding this matched text to this scancode parsing / model so that programatic use can make use of it. But I'm against putting it into the OrtResult due to scalability / file size.

Mar 13 '24 14:03 fviernau

I was wondering if adding matchedText to the evaluator results would not blow up the size of the file and potentially cause memory issues in the reporter.

I agree.

Me too.

I'd be ok with adding this matched text to this scancode parsing / model so that programatic use can make use of it. But I'm against putting it into the OrtResult due to scalability / file size.

I've done that now in https://github.com/oss-review-toolkit/ort/pull/8478.

As there unfortunately are still quite some things wrong with this PR (for example, copyright findings should not have matched_text at all, and there are detekt findings), I propose to close this PR in favor of only merging https://github.com/oss-review-toolkit/ort/pull/8478.

Apr 02 '24 17:04 sschuberth

As there unfortunately are still quite some things wrong with this PR (for example, copyright findings should not have matched_text at all, and there are detekt findings), I propose to close this PR in favor of only merging #8478.

As #8478 was merged, let's close this.

What does everybody think about addind these matched texts into a separate file?

As a follow-up, we might think about using the rather new storage for file lists to also store these matched tests.

Apr 03 '24 08:04 sschuberth

ort ort copied to clipboard

Added matchedText variable to Scancode model and model mapper.

Codecov Report

ort
ort copied to clipboard