scancode-toolkit
scancode-toolkit copied to clipboard
Instructions for Identifying Dependencies Unclear
The project description lists dependencies as one of the main categories of items Scancode detects; however, it is not clear from the wiki, the readme, or the command-line help how one actually uses Scancode to detect dependencies or whether this functionality is still missing or in development. Please update the readme or wiki to accurately reflect the state of dependency-checking in Scancode.
@kinxer Thanks for chiming and this is a fair point, this is not much documented: this coming from the package scan. I will make sure this is clear in the upcoming doc for release 2.0. (e.g. both in the wiki and the CLi help and the readme)
@kinxer if this was not clear it comes with the default scan or the --package option
@pombredanne Your previous reply made that clear; thank you for the replies. I do not, however, see how the packages section of the output is particularly useful for identifying dependencies. If that is a misunderstanding on my part, I ask that you make the documentation clear on that point.
ATM npm (JS), composer (PHP) and maven (POMs) direct dependencies are collected in the "dependencies" subsection of a given package entry.
Other are in the works (such as Godeps and Rubygems in https://github.com/nexB/scancode-toolkit-contrib/tree/develop/src/packagedcode2)
BUT there is bug.... this is not wired correctly! You should see this for a Maven pom and you do not see this for instance:
$ ./scancode -p -f json-pp tests/packagedcode/data/m2/p6spy/p6spy/1.3/p6spy-1.3.pom
Scanning files for: packages with 1 process(es)...
Scanning files...
[####################] 1
Scanning done.
Scan statistics: 1 files scanned in 0s.
Scan options: packages with 1 process(es).
Scanning speed: 2.39 files per sec.
Scanning time: 0s.
Indexing time: 0s.
Saving results.
{
"scancode_notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
"scancode_version": "2.0.0",
"scancode_options": {
"--package": true,
"--license-score": 0,
"--ignore": [],
"--format": "json-pp"
},
"files_count": 1,
"files": [
{
"path": "1.3/p6spy-1.3.pom",
"scan_errors": [],
"packages": [
{
"type": "Apache Maven",
"name": "p6spy:p6spy",
"version": "1.3",
"primary_language": "Java",
"packaging": "archive",
"summary": "P6Spy",
"description": "P6Spy is an open source framework for applications that intercept and optionally modify database statements.",
"payload_type": null,
"size": null,
"release_date": null,
"authors": [
{
"type": "person",
"name": "Alan Arvesen",
"email": "[email protected]",
"url": null
},
{
"type": "person",
"name": "Bradley Johnson",
"email": "[email protected]",
"url": null
},
{
"type": "person",
"name": "Frank Quatro",
"email": "[email protected]",
"url": null
},
{
"type": "person",
"name": "Jeff Goke",
"email": "[email protected]",
"url": null
},
{
"type": "person",
"name": "thinknot",
"email": "[email protected]",
"url": null
}
],
"maintainers": [],
"contributors": [],
"owners": [],
"packagers": [],
"distributors": [],
"vendors": [],
"keywords": [],
"keywords_doc_url": null,
"metafile_locations": [],
"metafile_urls": [],
"homepage_url": "http://www.p6spy.com/",
"notes": null,
"download_urls": [],
"download_sha1": null,
"download_sha256": null,
"download_md5": null,
"bug_tracking_url": null,
"support_contacts": [],
"code_view_url": null,
"vcs_tool": null,
"vcs_repository": null,
"vcs_revision": null,
"copyright_top_level": null,
"copyrights": [],
"asserted_licenses": [
{
"license": "The P6Spy Software License, Version 1.1",
"url": "http://cvs.sourceforge.net/viewcvs.py/*checkout*/p6spy/p6spy/license.txt?rev=HEAD",
"text": null,
"notice": null
}
],
"legal_file_locations": [],
"license_expression": null,
"license_texts": [],
"notice_texts": [],
"dependencies": {
"compile": [
{
"name": "regexp:regexp",
"version": null,
"version_constraint": "1.3"
},
{
"name": "gnu-regexp:gnu-regexp",
"version": null,
"version_constraint": "1.1.4"
},
{
"name": "log4j:log4j",
"version": null,
"version_constraint": "1.2.8"
},
{
"name": "ant:ant",
"version": null,
"version_constraint": "1.6.2"
},
{
"name": "oracle:classes12",
"version": null,
"version_constraint": "9.2.0.5"
},
{
"name": "jboss:jboss",
"version": null,
"version_constraint": "2.4.6"
}
]
},
"related_packages": []
}
]
}
]
}
I need to fix this ASAP!
This latest commit fixes the lack of Maven package collection. Other should work OK. Out of curiosity what are the package managers/formats you work with?
Mostly Maven and Pip. Thank you for the Maven update. It should be very helpful.
@kinxer Thanks you ++ for bringing it up ... I cannot fathom how the whole code was there but not wired in properly and that there was no proper tests on the CLI and package recognition side :|
Note a couple things:
-
on the python side, we are need to improve the code for installed packages detection (e.g. dist-info, egg-info etc) in #253 and properly add the dependencies in #653
-
eventually there will be a tool to also resolve dependencies (including querying remote repos) in https://github.com/nexB/dependentcode/ : Some details are in https://github.com/nexB/aboutcode/pull/2#issuecomment-282987036 and some ongoing discussion in https://github.com/nexB/dependentcode/issues/1
The maven dependencies should be properly collected in develop now. .... Still need to add proper docs.
Some updates on how we handle dependencies now, repasting from https://github.com/aboutcode-org/scancode-toolkit/issues/3828#issuecomment-2288262995
Just a bit of updates there:
- we detect direct dependencies in manifests and lockfiles now in ScanCode toolkit
- deplock in https://github.com/nexB/dependency-inspector/ can generate missing dependency lockfiles for parsing with 1.
- PurlDB can scan and store scan results for source and binaries for the packages
- ScanCode.io can detect the dependencies like ScanCode toolkit parsing the lockfile eventually generated by deplock
- We can also match other non-documented dependencies using matchcode (backed by PurlDB signatures)
- ScanCode.io can also find "hidden" dependencies in binaries using the "map deploy to devel" pipeline.
A simple process to scan all the dependencies:
- run deplock
- then scan your project in ScanCode.io to detect the packages
- add also the populate purldb pipeline: this will trigger a full source and binary scan of all the dependencies
- enrich the scan results with a purldb lookup
@pombredanne From above, is the package's direct dependencies can be detected with the --package scan?
for the indirect dependencies, I don't quite understand the process:
- run deplock
- then scan your project in ScanCode.io to detect the packages
- add also the populate purldb pipeline: this will trigger a full source and binary scan of all the dependencies
- enrich the scan results with a purldb lookup
- how to run the deplock, what's the input and what's the output?
- create a SCIO project with the output from above and run some pipelines? what pipelines should I be running?
- do I need to run anything in purldb?