Philippe Ombredanne comments

Results 987 comments of


Philippe Ombredanne

RFC: Primary License. Create and maintain a primary declared_license_expression field.

This is not yet merged but completed in the #2961 PR

packagecode: gradle nebula dependency lock parser

@tardyp note that I have done quite a bit of research on how to parse gradle builds at least the Groovy kind, and we could likely consider the Kotlin kind...

packagecode: gradle nebula dependency lock parser

@tardyp FYI @JonoYang is contributing some support for gradle in #2822

packagecode: gradle nebula dependency lock parser

I think we should also support first the standard Gradle lockfile: https://docs.gradle.org/current/userguide/dependency_locking.html - Names: `gradle.lockfile` and buildscript-gradle.lockfile` - Content: This is an ini or properties-like file: > Each line still...

Improve gibberish license/copyright detection

That's an interesting class of errors! I guess that they all come from binaries? And because it is useful, we cannot stop detecting in binaries. Some remarks: These are at...

Improve gibberish license/copyright detection

Note that the adoption of https://github.com/nexB/pygmars/ as a replacement for NLTK should allow the easier reuse and integration of other libraries in the lexing process including NER and giberish detection....

Improve gibberish license/copyright detection

Another candidate for gibberish that works quite well is https://github.com/domanchi/gibberish-detector

Improve gibberish license/copyright detection

See - #2304 - #2403

Improve gibberish license/copyright detection

I ran this with [bad-copyright-detections.txt](https://github.com/nexB/scancode-toolkit/files/5985058/bad-copyright-detections.txt) - `pip install gibberish-detector` - `gibberish-detector train examples/big.txt > big.model` - in python: ```Python from gibberish_detector import detector Detector = detector.create_from_model('big.model') data = sorted(set(open('bad-copyright-detections.txt').read().split())) for...

Improve gibberish license/copyright detection

very nice! what's your take on applicability to license then? Did you apply some boosting to legalese words?