almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

Potentially Misleading CSP Queries

Open ddworken opened this issue 9 months ago • 7 comments

I was taking a look at https://almanac.httparchive.org/en/2024/security#fig-16 and I think that the way the CSP stats are calculated may be slightly misleading relative to how CSP is used in the wild. Specifically, I have two concerns:

  1. csp_script_source_list_keywords.sql is counting how many policies contain unsafe-inline. While the SQL code is technically correct, this is overcounting the prevalence of CSPs that actually allow unsafe-inline since that directly is automatically ignored if a policy has strict-dynamic. This means that the recommended way of creating a backwards-compatible strict CSP is getting flagged as containing unsafe-inline.
    • I'll say at Google, we essentially always include unsafe-inline in our nonce-based CSPs which is likely lead to an inflation of this metric.
  2. csp_script_source_list_keywords.sql selects the CSP headers using UNNEST(response_headers). This means that if a page sets two CSP headers (e.g. one strict nonce-based policy and one allowlist-based policy), those will get counted separately. From a web security POV, what really matters is that a given page has a nonce-based policy, so it seems to me like it would be ideal to have a metric that says "What percent of pages have a nonce-based CSP?" rather than a metric that says "What percent of CSPs are nonce-based?"
    • Similarly, at Google many of our services set both a nonce-based policy and an allowlist-based policy so I suspect we're also contributing towards this metric calculation method being sub-optimal.

I'm wondering: Do people agree with this assessment? If so, what should our next steps be? We could update csp_script_source_list_keywords.sql, or just save this issue as something to be addressed in the next edition of the almanac.

ddworken avatar Mar 12 '25 21:03 ddworken

cc: @JannisBush

ddworken avatar Mar 12 '25 21:03 ddworken

FYI @GJFR @vikvanderlinden @JannisBush

On the first, this is noted in the text;

However, the increasing adoption of the nonce- and strict-dynamic keywords is a positive development. By using the nonce- keyword, a secret nonce can be defined, allowing only inline scripts with the correct nonce to execute. This approach is a secure alternative to the unsafe-inline directive for permitting inline scripts. When used in combination with the strict-dynamic keyword, nonced scripts are permitted to import additional scripts from any origin. This approach simplifies secure script loading for developers, as it allows them to trust a single nonced script, which can then securely load other necessary resources.

But agree that it would be nice to know how many are setting unsafe-inline without strict-dynamic. Once for next year maybe?

On the second, I think HTTP Archive collapses HTTP headers into one line, but would need to double check that.

tunetheweb avatar Mar 12 '25 22:03 tunetheweb

This essentially boils down to "effective keyword" vs "declared keyword". Currently, this query (but probably also others in this and maybe other chapters as similar things likely also affect other headers) only count the "declared keywords" and not the "effective keywords".

The caption of Figure 19 (corresponding to the query we are discussing here) states "Prevalence of CSP script-src keywords." An informed reader should be able to infer that the figure is about "declared keywords" and not about the "effective keywords". However, I agree that adding a note that the "declared keywords" are not necessarily the same as the "effective keywords" would be nice. In the best case, we would report on both the "declared keywords" and the "effective keywords". In my opinion, this would be a nice addition for next (this?) year and we do not necessarily need to update the 2024 chapter.


On the second point, good catch. We currently count they keywords for each header separately.

On the second, I think HTTP Archive collapses HTTP headers into one line, but would need to double check that.

I just tested and the HAR from https://www.webpagetest.org/ lists each header separately and HTTP Archive also does not seem to perform automating header folding (I have not explicitly checked for CSP but saw multiple server-timing and set-cookie headers in the sample data).

The description in the figure/query as total_pages_with_csp and Percent of Pages is wrong and should instead be total_csp_headers/Percent of CSP Headers or something similar. (In the other CSP queries the naming seems correct, only here it seens incorrect; OT: shouldn't it be Percentage and not Percent?). In the Security Headers Prevalence Query we count the number of pages that have at least one of each header. I think we did not compute numbers on how many responses have more than one header/how many headers on average they had and so on, that would be interesting as well. In that query we also did not limit to is_main_document thus we cannot compare the number of "Pages with CSP" there to "Number of CSP Headers" in the other queries to get an estimate on how much these two metrics diverge.


For a full "effective keyword" analysis, we would also need to take into account that a single header can contain multiple "serialized policies" which effectively is the same as sending multiple headers (CSP: A, B is the same as CSP: A\n CSP: B). Thus, we would need to first fold all CSP headers together and then do the full CSP algorithm (taking the list of policies as input) to infer the "effective keywords". Maybe there is a better way with a custom_metric. Can we simply ask the browser what the current "effective keywords" are and save them? There are probably even more edge-cases one should consider, for example if there is no script-src the browser falls-back to default-src and if there is script-src-elem and script-src-attr one should probably consider them and not the content of script-src.


Update:

SELECT
  client,
  count(0) as num_csp_headers,
  count(DISTINCT page) as num_requests,
  count(DISTINCT url) as num_urls
FROM `httparchive.sample_data.requests_10k`,
UNNEST(response_headers) as response_header
WHERE
  LOWER(response_header.name) = 'content-security-policy'
  AND is_main_document
GROUP BY
  client
Client Num CSP Headers Num Requests Num URLs
Mobile 3979 3782 3771
Desktop 3697 3483 3471

There seem to be slightly more CSP headers as number of requests (currently I cannot run it on the full dataset as I run it from my personal BigQuery account).

JannisBush avatar Mar 13 '25 09:03 JannisBush

OT: shouldn't it be Percentage and not Percent?

IMHO yes. But I think this is a European versus US English thing and I'm European. Percentage is more common in the former, while Percent is common in the latter in my experience.

Most style guides suggest "percent" when talking about a number and "percentage" for the more general use case. For example:

“Despite changing usage, Chicago continues to regard percent as an adverb (“per, or out of, each hundred,” as in 10 percent of the class)—or, less commonly, an adjective (a 10 percent raise)—and to use percentage as the noun form (a significant percentage of her income). The symbol %, however, may stand for either word.” (3.82)

tunetheweb avatar Mar 13 '25 10:03 tunetheweb

I agree that improving both metrics in the next almanac makes sense. It would be valuable to at least assess the potential discrepancy. If significant, we could retroactively add clarifying footnotes to previous editions?

@JannisBush, I like your idea of performing an 'effective keyword' analysis within the browser. However, I'm not sure if browsers provide functionality we could leverage for this. Perhaps there's a CSP evaluator tool, like Google's CSP Evaluator, that could help. This one for example detects whether unsafe-inline is ignored when strict-dynamic is present.

That said, I’m unsure how we could integrate this into the crawl/data analysis without adding excessive complexity.

GJFR avatar Mar 17 '25 09:03 GJFR

Maybe we could use some already existing Chrome Metrics for this purpose 🤔 I think they are available in the feature struct: https://har.fyi/reference/structs/feature/

Looking at https://chromestatus.com/metrics/feature/popularity and searching for CSP there are a couple of interesting ones such as CSPWithUnsafeEval.

Here https://source.chromium.org/chromium/chromium/src/+/main:docs/security/web-mitigation-metrics.md?q=CSPWith&ss=chromium%2Fchromium%2Fsrc and https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/frame/csp/content_security_policy.cc?q=CSPWithUnsafe&ss=chromium%2Fchromium%2Fsrc&start=11 are more details on these metrics.

JannisBush avatar Jun 10 '25 09:06 JannisBush

If you're interested in those metrics, you should check out https://mitigation.supply for an overview that (I believe) is based on those metrics.

ddworken avatar Jun 10 '25 14:06 ddworken