scorecard Feature: Improved Security-Policy checks

Is your feature request related to a problem? Please describe. The mere existence of a SECURITY.md file (or other file(s) searched such as .adoc) with 1 or more bytes in the repo rewards the project's score to the fullest possible score (10).

Describe the solution you'd like Gradate the Security-Policy check which would incrementally improve the Security-Policy score if the SECURITY.md (or found file) contained links to pages or email addresses. That is detecting https:, or mailto:, or some other pattern (perhaps using regex) on the contents of that file. There would be no need to actually determine if the https:, or mailto: actually resolved (DNS) or didn't bounce (SMTP). Perhaps:

no Security-Policy: score 0 one found link content: score += 3 more than one found link content: score += 6 more bytes than the sum of the length of all the link content found: score += 4

the score of 10 here could then assume that around all that link content are intelligible words. for instance:

Sed ut perspiciatis unde omnis iste https://example.com/security or email [email protected] natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni

would score 10 as it has 2 link contents (score = 6) and a length of content (~379) which is greater than the sum of the length of the two link content (48 = strlen of the https (28) + strlen of the email (20)) would be (score = 6 + 4). If the content had no link content, then the final score would be score = 4 or had it only 1 link content, score = 7.

In practice, ossf's own scorecard file would score = 7 (it has one email address and many bytes beyond the length of that email). If ossf's scorecard is intended to be the exemplar, then perhaps another scale could be devised to detect portions of timelines, days, and other disclosure practices - which would require language analysis.

Describe alternatives you've considered If a score for a project was non-zero, pull the appropriate security-policy file and inspect/parse myself. scorecard already details in the report that "reason": "security policy file detected" but does not display the file/path/name/URI of what was actually detected--therefore, the burden is on the consumer to 'go hunting' (first) then analyze later.

Additional context The same could be said for the checks License and Signed-Releases as long as the representative file in GitHub is 1 byte or more, the project is award full points. Although checking the validity of Signed-Releases is troublesome, a single byte length file seems too small for any signature. For License, more NLP might be called for, but with further investigation known strings could be searched and recommended for those that want to be scorecard friendly. Such as https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository or https://opensource.org/licenses

Aug 11 '22 22:08 shissam

sounds like a good idea. One worry is around false negative. We want to be sure we don't miss certain patterns. Have you checked, say, 100 SECURITY.md files to validate false negatives don't occur?

If you're interested in this, please feel free to send a PR once you've done some preliminary validation.

Aug 12 '22 17:08 laurentsimon

I can do that... when you say PR - sorry, my brain is not decoding, do you mean Problem Report?

Aug 12 '22 19:08 shissam

I can do that... when you say PR - sorry, my brain is not decoding, do you mean Problem Report?

Pull Request https://docs.github.com/en/get-started/quickstart/github-glossary#pull-request

Aug 12 '22 19:08 naveensrinivasan

sounds like a good idea. One worry is around false negative. We want to be sure we don't miss certain patterns. Have you checked, say, 100 SECURITY.md files to validate false negatives don't occur?

If you're interested in this, please feel free to send a PR once you've done some preliminary validation.

Once I get the org-level policy scanning working, I expect I'll be ready to make a pull request. the rules (criteria evaluation) was a "best guess" and I am curious what would be a good forum to get feedback and likely better ideas, and I'd like to have that baselined before the pull request as well. (these are both explained a bit more below)

@laurentsimon - so I did an experiment on 74 Kubernetes repos, 90 Apache repos, 90 Eclipse repos, and 42 OSSF repos for a total of 296 repos and 85 total scanned security policies.

36 Kubernetes repos have their own security policy (scanned) and the remaining inherit the org-level security policy (not scanned)
11 Apache repos have their own security policy (scanned) - the remaining inherit the org-level policy (not scanned)
Eclipse does not institute an org-level repo-policy, 32 have a security policy (scanned) and the remaining have NO a security policy
OSSF does not institute an org-level repo-policy, 6 have a security policy (scanned) and the remaining have NO securirty policy

The results are in this google sheet: https://docs.google.com/spreadsheets/d/1pvIiUZRvDPgGbEEwpqGjK8HfgcprstBw-ZT45w1CSCg/edit?usp=sharing

caveats:

only repo specific security policies were scanned (org-level (i called ‘URL security policy file detected’ is the next thing i need to implement, I think I know how)
i really tried to not be spoken language specific (i.e., English) but had to revert to searching for words like “vuln” and “disclos” (not full words) for hints about disclosures and vulnerabilities
i tried to err on the side of false positives (generous-award points) rather than false negatives (hold back points) — this would be “no worse” than the current existance test of the current scorecard

the rules:

criterial scoring details:

// score := 0
// #1: found one linked (email/http) content: score += 3
	rationale: someone to collaborate with or link to information (strong for community)
// #2: more than one unique (email/http) linked content found: score += 3
	rationale: if more than one link, even stronger for the community
// #3: more bytes than the sum of the length of all the linked content found: score += 3
	rationale: there appears to be information and context around all those links
// #4: found words and/or numbers (while ignoring units) which perhaps hint at vul disclosure practices (more than one hit needed) += 1
	rationale: works towards the intent of the security policy file regarding whom to contact about vuls and disclosures

Of the 36 Kubernetes repos with their own security policy, no scores were reduced/changed from the current version of scorecard.

Of the 11 Apache repos with their own security policy, 3 repos scores were reduced from 10 to 7 because those repos only had one "piece" of linked content (email/URL); 1 repo was reduced from 10 to 9 because the policy had 0 or 1 potential indicator(s) of vul disclosure practices.

For the eclipse repos (recall there is no eclipse org-level policy):

58 repos scored 0 because there was no policy--this was unchanged from the current version of scorecard
1 repo score was reduced from 10 to 6 because that repo only had one "piece" of linked content (email/URL) and had 0 or 1 potential indicator(s) of vul disclosure practices
7 repos score were reduced from 10 to 7 because those repos only had one "piece" of linked content (email/URL)
24 repos score achieved 10 (no score change) because those repose has multiple pieces of linked content (email/URL), and expressed timelines.

For the OSSF repos (again, no org-level policy):

36 repos scored 0 because there was no policy--this was unchanged from the current version of scorecard
6 repos score were reduced from 10 to 7 because those repos only had one "piece" of linked content (email/URL)

Aug 17 '22 19:08 shissam

Thank you for this preliminary analysis. If there was a way to map a score to the piece of information that's missing, it would be even better, but I'm not sure it's do-able here. (We this for the branch protection check).

Let me cc @annabellegoth2boss, @SecurityCRob and @david-a-wheeler who have worked on the vulnerability disclosure guideline and may have additional feedback https://github.com/google/oss-vulnerability-guide/blob/main/guide.md

Aug 19 '22 18:08 laurentsimon

scorecard scorecard copied to clipboard

Feature: Improved Security-Policy checks

scorecard
scorecard copied to clipboard