Error and Skip Handling in Heuristic Malware Analysis
The use of the SKIP result for heuristics is not clear. It is used in some places for when errors occur, and for some places where the heuristic does not apply. Separating the SKIP result from error handling will make results clearer.
Proposed definition of a SKIP result: a heuristic should return HeuristicResult.SKIP when this heuristic analysis is not applicable to the package. An example would be when the SUSPICIOUS_SETUP heuristic is run on a package with no setup.py file. In this case, SKIP is an appropriate result as this heuristic is not applicable to the package, but the package information is not malformed.
Identified appropriate and inappropriate uses of the SKIP result currently in the codebase are listed below:
Appropriate uses:
anomalous_version.py: returns aSKIPif the version cannot be interpreted as per PEP 440. This is fine as the package is not malformed, the heuristic does not apply here.suspicious_setup.py: returns aSKIPif there is no setup.py, which is appropriate as the heuristic does not apply in this case. This does need to be refactored to error when setup.py is found, but there is a problem trying to analyse it.
Inappropriate uses:
closer_release_join_date.py:SKIPis returned if there are no maintainers or no latest release information. This would be a result of a malformed metadata file, or a problem in parsing the HTML page, and is such an error. Release and maintainer information must exist.high_release_frequency.py:SKIPis returned if there are no releases or if there is only one release. If there are no releases, then this is malformed metadata. If there is 1 release, then this heuristic should not have been run. These both constitute errors.one_release.py:SKIPis returned if there are no releases. If there are no releases, then this is malformed metadata, implying an error has occurred.unchanged_release.py:SKIPis returned if there are no digests. This would occur if there are no releases, or if there are no digest fields in the releases. In both cases, this would be malformed metadata, so an error has occurred.
I fixed that here 1059