codeql
codeql copied to clipboard
Added detection for specific Polyfill.io CDN compromise - edited existing library and added new query and tests
Polyfill.io was a popular JavaScript polyflll Content Delivery Network (CDN), used to support new web browser standards on older browsers.
In February 2024 the domain was sold, and in June 2024 it was widely publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised.
Subresource Integrity (SRI) would have prevented the undetected replacement of the Polyfill script code.
This CodeQL query builds on a library, semmle/javascript/security/FunctionalityFromUntrustedSource.qll, which was abstracted from an existing query, javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.ql, to detect specific cases where a script from polyfill.io or cdn.polyfill.io was used without Subresource Integrity, and so the service using it was exposed to compromise.
I've modified that existing library to add polyfill.io and cdn.polyfill.io as domains that require SRI, which makes the existing query detect such uses, and I have added a getUrl() predicate to classes in the library so that I can filter down only to results from polyfill.io in a new query.
I added a specific test for polyfill.io use in its most common context, in an HTML script tag.
This query will work for cases in HTML or in JavaScript code, but will not find cases in any other languages such as PHP or Python where server-side scripting is used to generate client-side HTML or JavaScript.
A code search across github.com hosted sites will find more results manually.
QHelp previews:
javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp
errors/warnings:
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:9:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:11:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:15:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:17:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:25:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:34:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:40:4: The element type "p" must be terminated by the matching end-tag "</p>".
A fatal error occurred: 1 qhelp files could not be processed.
QHelp previews:
javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp
errors/warnings:
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp: Could not find sample polyfill-cloudflare-check.html
A fatal error occurred: 1 qhelp files could not be processed.
QHelp previews:
javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedDomain.qhelp
Untrusted domain used in script or other content
Content Delivery Networks (CDNs) are used to deliver content to users quickly and efficiently. However, they can change hands or be operated by untrustworthy owners, risking the security of the sites that use them. Some CDN domains are operated by entities that have used CDNs to deliver malware, which this query identifies.
For example, polyfill.io was a popular JavaScript CDN, used to support new web browser standards on older browsers. In February 2024 the domain was sold, and in June 2024 it was publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised. The same operator runs several other CDNs, undermining trust in those too.
Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element) on a page, it is important to ensure that the received data is not malicious.
Even when https is used, an untrustworthy operator might deliver malware.
See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of untrusted domains used by this query.
Recommendation
Carefully research the ownership of a Content Delivery Network (CDN) before using it in your application.
If you find code that originated from an untrusted domain in your application, you should review your logs to check for compromise.
To help mitigate the risk of including a script that could be compromised in the future, consider whether you need to use polyfill or another library at all. Modern browsers do not require a polyfill, and other popular libraries were made redundant by enhancements to HTML 5.
If you do need a polyfill service or library, move to using a CDN that you trust.
When you use a script or link element, you should check for subresource integrity (SRI), and pin to a hash of a version of the service that you can trust (for example, because you have audited it for security and unwanted features). A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as hashes for the content required for the major browsers used by your users.
You can also choose to self-host an uncompromised version of the service or library.
Example
The following example loads the Polyfill.io library from the polyfill.io CDN. This use was open to malicious scripts being served by the CDN.
<html>
<head>
<title>Polyfill.io demo</title>
<script src="https://cdn.polyfill.io/v2/polyfill.min.js" crossorigin="anonymous"></script>
</head>
<body>
...
</body>
</html>
Instead, load the Polyfill library from a trusted CDN, as in the next example:
<html>
<head>
<title>Polyfill demo - Cloudflare hosted with pinned version (but no integrity checking, since it is dynamically generated)</title>
<script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" crossorigin="anonymous"></script>
</head>
<body>
...
</body>
</html>
If you know which browsers are used by the majority of your users, you can list the hashes of the polyfills for those browsers:
<html>
<head>
<title>Polyfill demo - Cloudflare hosted with pinned version (with integrity checking for a *very limited* browser set - just an example!)</title>
<script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" integrity="sha384-i0IGVuZBkKZqwXTD4CH4kcksIbFx7WKFMdxN8zUhLFHpLdELF0ym0jxa6UvLhW8/ sha384-3d4jRKquKl90C9aFG+eH4lPJmtbPHgACWHrp+VomFOxF8lzx2jxqeYkhpRg18UWC" crossorigin="anonymous"></script>
</head>
<body>
...
</body>
</html>
References
- Sansec: Polyfill supply chain attack hits 100K+ sites
- Cloudflare: Upgrade the web. Automatically. Delivers only the polyfills required by the user's web browser.
- Fastly: New options for Polyfill.io users
- Wikipedia: Polyfill (programming)
- MDN Web Docs: Subresource Integrity
- Common Weakness Enumeration: CWE-830.
javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.qhelp
Inclusion of functionality from an untrusted source
Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element or an iframe element) on a page, it is important to ensure that the received data is not malicious.
When including external resources, it is possible to verify that the responding server is the intended one by using an https URL. This prevents a MITM (man-in-the-middle) attack where an attacker might have been able to spoof a server response.
Even when https is used, an attacker might still compromise the server. When you use a script element, you can check for subresource integrity - that is, you can check the contents of the data received by supplying a cryptographic digest of the expected sources to the script element. The script will only load sources that match the digest and an attacker will be unable to modify the script even when the server is compromised.
Subresource integrity (SRI) checking is commonly recommended when importing a fixed version of a library - for example, from a CDN (content-delivery network). Then, the fixed digest of that version of the library can easily be added to the script element's integrity attribute.
A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as those for the content generated for major browers used by your users.
See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of hostnames required to use SRI by this query.
Recommendation
When an iframe element is used to embed a page, it is important to use an https URL.
When using a script element to load a script, it is important to use an https URL and to consider checking subresource integrity.
Example
The following example loads the jQuery library from the jQuery CDN without using https and without checking subresource integrity.
<html>
<head>
<title>jQuery demo</title>
<script src="http://code.jquery.com/jquery-3.6.0.slim.min.js" crossorigin="anonymous"></script>
</head>
<body>
...
</body>
</html>
Instead, loading jQuery from the same domain using https and checking subresource integrity is recommended, as in the next example.
<html>
<head>
<title>jQuery demo</title>
<script src="https://code.jquery.com/jquery-3.6.0.slim.min.js" integrity="sha256-u7e5khyithlIdTpu22PHhENmPcRdFiHRjhAuHcs05RI=" crossorigin="anonymous"></script>
</head>
<body>
...
</body>
</html>
References
- MDN: Subresource Integrity
- Smashing Magazine: Understanding Subresource Integrity
- Common Weakness Enumeration: CWE-830.
Because Polyfill.io dynamically generates its content, SRI is apparently impossible.
I'm going to redraft this tomorrow to account for this.
I will refactor the library I made to separate known-bad domains from those requiring SRI.
I also need to remove any references to SRI being a mitigation, and edit the Cloudflare example somehow to make it work - my current example assumes SRI will function correctly.
Hmmm. From reading the updates in the article: https://sansec.io/research/polyfill-supply-chain-attack it sounds like the problem has been fixed.
The domain no longer exists (see status field in https://who.is/whois/polyfill.io), so https://polyfill.io no longer exists (you can safely click the link, it doesn't work).
That's totally true.
An alert about using this domain isn't going to change future exposure, but rather alert people to the fact that there was prior exposure.
That allows them to start a security response.
I've made a few changes in the interest of getting this merged.
I've extended the query now using data extensions, so it includes more domains operated by the same CDN operator who used polyfill.io maliciously. Those are still active, and present a clear risk to users of them who do not use SRI.
Nevertheless, I moved it to an experimental query. I'd rather it was merged into the main queries for widespread visibility of code that is or was exposed to this risk, but will accept merging into experimental and escalating a move to the main queries later.
It also allows users to extend the query using a data extensions model pack.
I also changed the existing CDN query to allow the extension of the CDN list that requires an SRI more easily in a data extension file. It no longer requires escaping dots in a regex - you just add the domain to the next line in a string in a tuple in the YAML file.
Hope that motivates merging this.
I've moved the query into the main set of queries, and slightly updated the changes-notes entry. Nothing practical changed, so 🤞 it passes cleanly in the new location
I did some evaluations, and they look good: (nightly, default (internal links)).
There are some TPs that I like.
Including a large webpage that previously used polyfill.io (but they have since changed to use the cloudfare hosted copy).
There is a lot of documentation here, and I would like some eyes from the doc team, but the PR is good to merge after that.
(The doc team gets pinged with the label I just added).
I'm on PTO the next few weeks, but you can ask in #codeql-dynamic for someone else to press the merge button.
Tell them that Erik said it's OK to merge.
@felicitymay thanks so much for the suggestions!
I transferred my note about using SRI with dynamic services into the other query help too. It's the same wording in both places now.
I thought about usability and added a note on CUSTOMIZING.md into a separate file from the query help, and referenced its existence in the query help.
It describes how someone can extend the queries with data extensions, to add their own hostname that needs SRI, or to add their own untrusted domain.
Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?
I'll definitely accept your suggestions when I'm back at my IDE, but I don't really want to slow down merging this to get this customisation docs part perfect.
Sound OK, or no go?
Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?
Happy to accept this approach. In that case, can we link directly to where the new file will be, that is: https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md instead of just mentioning the file name?