Philippe Ombredanne
Philippe Ombredanne
This is not yet merged but completed in the #2961 PR
@tardyp note that I have done quite a bit of research on how to parse gradle builds at least the Groovy kind, and we could likely consider the Kotlin kind...
@tardyp FYI @JonoYang is contributing some support for gradle in #2822
I think we should also support first the standard Gradle lockfile: https://docs.gradle.org/current/userguide/dependency_locking.html - Names: `gradle.lockfile` and buildscript-gradle.lockfile` - Content: This is an ini or properties-like file: > Each line still...
That's an interesting class of errors! I guess that they all come from binaries? And because it is useful, we cannot stop detecting in binaries. Some remarks: These are at...
Note that the adoption of https://github.com/nexB/pygmars/ as a replacement for NLTK should allow the easier reuse and integration of other libraries in the lexing process including NER and giberish detection....
Another candidate for gibberish that works quite well is https://github.com/domanchi/gibberish-detector
See - #2304 - #2403
I ran this with [bad-copyright-detections.txt](https://github.com/nexB/scancode-toolkit/files/5985058/bad-copyright-detections.txt) - `pip install gibberish-detector` - `gibberish-detector train examples/big.txt > big.model` - in python: ```Python from gibberish_detector import detector Detector = detector.create_from_model('big.model') data = sorted(set(open('bad-copyright-detections.txt').read().split())) for...
very nice! what's your take on applicability to license then? Did you apply some boosting to legalese words?