wg-vulnerability-disclosures
wg-vulnerability-disclosures copied to clipboard
Common scoring system for vulnerability test coverage?
When working the security issues, we have the CVSS to gauge how severe a given issue is.
The problem is, that when a fix for an issue is released, it's not obvious what kind of test coverage was employed to ensure that the fix actually fixes the issue, that it fixes the general case, not only the specific case, or how extensive the test coverage for that issue is.
Secondly, when we consider issues in common protocols or data exchange formats, it's not uncommon that multiple implementations have the same or similar issues. So having documentation that a CVE-XXXX-YYYYY-like issue from library Z isn't also present in libraries other than Z because they test it in "such, such, and such way", would also be really useful.
(technically, this idea has overlap between the work groups, especially the Best Practices, but I'm filing it here as I'd rather keep the scope focused on security at the beginning rather than correctness in general)
So, do you think this is the best workgroup to start work on this? If yes, what would you suggest as next steps?
@tomato42 Can you provide an example of what this would look like for some specific CVE and library? If I understand your description, you're proposing some formalized way to recognize a particular vulnerability that may affect multiple implementations (and thus have several CVEs assigned) that could then be used to show (and test) if library X is vulnerable or not. Perhaps something akin to this table of XML vulns and how they affected different Python XML parsers?
@tomato42 Can you provide an example of what this would look like for some specific CVE and library?
tbh, I don't have a complete set of metrics in mind, but few of the ones I think should be considered are:
- presence of a unit test of the affected method
- negative unit test of the affected method
- test coverage of the affected method (different kinds, like modified condition/decision coverage, table coverage, not just line coverage)
- parameter value coverage of the affected method (how many different classes of inputs there are vs how many are tested), i.e. property based testing
- mutation score for the test cases that cover the affected method
- presence of fuzz tests for the method
- how extensive
- presence of performance tests of the affected method or ones that use the affected method
- tests for timing side-channel of the affected method and code that uses it
- tests for memory access invariance of the affected method
- static analysis tools error count for the affected method
- memory management checks of the affected method (no memory leaks, no unbounded memory growth, no uninitialised memory use)
- interoperability testing with other implementations
- integration tests that exercise the affected method
- presence of a formal machine-validated proof of correctness for the affected code
so for many fixes/bugs the score would be rather low; for many issues some of those things are completely irrelevant, I was thinking of an open-ended scale, starting at 0, for no tests, and then growing up for better and better test coverage
the problem is that some of them (like parameter coverage, or mutation score) are more subjective than others
If I understand your description, you're proposing some formalized way to recognize a particular vulnerability that may affect multiple implementations (and thus have several CVEs assigned) that could then be used to show (and test) if library X is vulnerable or not.
well, I'd argue that if you have at least two implementations of the same format you can have the same bug in both of them
Perhaps something akin to this table of XML vulns and how they affected different Python XML parsers?
I may do, but I'm not sure if it would be illustrative... also, I'm not familiar with them, so it would be hard for me to say how I should score them