codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Added detection for specific Polyfill.io CDN compromise - edited existing library and added new query and tests

Open aegilops opened this issue 1 year ago • 5 comments

Polyfill.io was a popular JavaScript polyflll Content Delivery Network (CDN), used to support new web browser standards on older browsers.

In February 2024 the domain was sold, and in June 2024 it was widely publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised.

Subresource Integrity (SRI) would have prevented the undetected replacement of the Polyfill script code.

This CodeQL query builds on a library, semmle/javascript/security/FunctionalityFromUntrustedSource.qll, which was abstracted from an existing query, javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.ql, to detect specific cases where a script from polyfill.io or cdn.polyfill.io was used without Subresource Integrity, and so the service using it was exposed to compromise.

I've modified that existing library to add polyfill.io and cdn.polyfill.io as domains that require SRI, which makes the existing query detect such uses, and I have added a getUrl() predicate to classes in the library so that I can filter down only to results from polyfill.io in a new query.

I added a specific test for polyfill.io use in its most common context, in an HTML script tag.

This query will work for cases in HTML or in JavaScript code, but will not find cases in any other languages such as PHP or Python where server-side scripting is used to generate client-side HTML or JavaScript.

A code search across github.com hosted sites will find more results manually.

aegilops avatar Jul 01 '24 15:07 aegilops

QHelp previews:

javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:9:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:11:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:15:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:17:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:25:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:34:6: element "p" not allowed here; expected the element end-tag, text or element "a", "b", "code", "em", "i", "img", "strong", "sub", "sup" or "tt"
/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp:40:4: The element type "p" must be terminated by the matching end-tag "</p>".
A fatal error occurred: 1 qhelp files could not be processed.

github-actions[bot] avatar Jul 01 '24 15:07 github-actions[bot]

QHelp previews:

javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/javascript/ql/src/Security/CWE-830/PolyfillIOCompromisedScript.qhelp: Could not find sample polyfill-cloudflare-check.html
A fatal error occurred: 1 qhelp files could not be processed.

github-actions[bot] avatar Jul 01 '24 15:07 github-actions[bot]

QHelp previews:

javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedDomain.qhelp

Untrusted domain used in script or other content

Content Delivery Networks (CDNs) are used to deliver content to users quickly and efficiently. However, they can change hands or be operated by untrustworthy owners, risking the security of the sites that use them. Some CDN domains are operated by entities that have used CDNs to deliver malware, which this query identifies.

For example, polyfill.io was a popular JavaScript CDN, used to support new web browser standards on older browsers. In February 2024 the domain was sold, and in June 2024 it was publicised that the domain had been used to serve malicious scripts. It was taken down later in that month, leaving a window where sites that used the service could have been compromised. The same operator runs several other CDNs, undermining trust in those too.

Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element) on a page, it is important to ensure that the received data is not malicious.

Even when https is used, an untrustworthy operator might deliver malware.

See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of untrusted domains used by this query.

Recommendation

Carefully research the ownership of a Content Delivery Network (CDN) before using it in your application.

If you find code that originated from an untrusted domain in your application, you should review your logs to check for compromise.

To help mitigate the risk of including a script that could be compromised in the future, consider whether you need to use polyfill or another library at all. Modern browsers do not require a polyfill, and other popular libraries were made redundant by enhancements to HTML 5.

If you do need a polyfill service or library, move to using a CDN that you trust.

When you use a script or link element, you should check for subresource integrity (SRI), and pin to a hash of a version of the service that you can trust (for example, because you have audited it for security and unwanted features). A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as hashes for the content required for the major browsers used by your users.

You can also choose to self-host an uncompromised version of the service or library.

Example

The following example loads the Polyfill.io library from the polyfill.io CDN. This use was open to malicious scripts being served by the CDN.

<html>
    <head>
        <title>Polyfill.io demo</title>
        <script src="https://cdn.polyfill.io/v2/polyfill.min.js" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

Instead, load the Polyfill library from a trusted CDN, as in the next example:

<html>
    <head>
        <title>Polyfill demo - Cloudflare hosted with pinned version (but no integrity checking, since it is dynamically generated)</title>
        <script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

If you know which browsers are used by the majority of your users, you can list the hashes of the polyfills for those browsers:

<html>
    <head>
        <title>Polyfill demo - Cloudflare hosted with pinned version (with integrity checking for a *very limited* browser set - just an example!)</title>
        <script src="https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.js?version=4.8.0" integrity="sha384-i0IGVuZBkKZqwXTD4CH4kcksIbFx7WKFMdxN8zUhLFHpLdELF0ym0jxa6UvLhW8/ sha384-3d4jRKquKl90C9aFG+eH4lPJmtbPHgACWHrp+VomFOxF8lzx2jxqeYkhpRg18UWC" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

References

javascript/ql/src/Security/CWE-830/FunctionalityFromUntrustedSource.qhelp

Inclusion of functionality from an untrusted source

Including a resource from an untrusted source or using an untrusted channel may allow an attacker to include arbitrary code in the response. When including an external resource (for example, a script element or an iframe element) on a page, it is important to ensure that the received data is not malicious.

When including external resources, it is possible to verify that the responding server is the intended one by using an https URL. This prevents a MITM (man-in-the-middle) attack where an attacker might have been able to spoof a server response.

Even when https is used, an attacker might still compromise the server. When you use a script element, you can check for subresource integrity - that is, you can check the contents of the data received by supplying a cryptographic digest of the expected sources to the script element. The script will only load sources that match the digest and an attacker will be unable to modify the script even when the server is compromised.

Subresource integrity (SRI) checking is commonly recommended when importing a fixed version of a library - for example, from a CDN (content-delivery network). Then, the fixed digest of that version of the library can easily be added to the script element's integrity attribute.

A dynamic service cannot be easily used with SRI. Nevertheless, it is possible to list multiple acceptable SHA hashes in the integrity attribute, such as those for the content generated for major browers used by your users.

See the [`CUSTOMIZING.md`](https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md) file in the source code for this query for information on how to extend the list of hostnames required to use SRI by this query.

Recommendation

When an iframe element is used to embed a page, it is important to use an https URL.

When using a script element to load a script, it is important to use an https URL and to consider checking subresource integrity.

Example

The following example loads the jQuery library from the jQuery CDN without using https and without checking subresource integrity.

<html>
    <head>
        <title>jQuery demo</title>
        <script src="http://code.jquery.com/jquery-3.6.0.slim.min.js" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

Instead, loading jQuery from the same domain using https and checking subresource integrity is recommended, as in the next example.

<html>
    <head>
        <title>jQuery demo</title>
        <script src="https://code.jquery.com/jquery-3.6.0.slim.min.js" integrity="sha256-u7e5khyithlIdTpu22PHhENmPcRdFiHRjhAuHcs05RI=" crossorigin="anonymous"></script>
    </head>
    <body>
        ...
    </body>
</html>

References

github-actions[bot] avatar Jul 01 '24 15:07 github-actions[bot]

Because Polyfill.io dynamically generates its content, SRI is apparently impossible.

I'm going to redraft this tomorrow to account for this.

I will refactor the library I made to separate known-bad domains from those requiring SRI.

I also need to remove any references to SRI being a mitigation, and edit the Cloudflare example somehow to make it work - my current example assumes SRI will function correctly.

aegilops avatar Jul 01 '24 17:07 aegilops

Hmmm. From reading the updates in the article: https://sansec.io/research/polyfill-supply-chain-attack it sounds like the problem has been fixed.
The domain no longer exists (see status field in https://who.is/whois/polyfill.io), so https://polyfill.io no longer exists (you can safely click the link, it doesn't work).

erik-krogh avatar Jul 01 '24 17:07 erik-krogh

That's totally true.

An alert about using this domain isn't going to change future exposure, but rather alert people to the fact that there was prior exposure.

That allows them to start a security response.

aegilops avatar Jul 08 '24 09:07 aegilops

I've made a few changes in the interest of getting this merged.

I've extended the query now using data extensions, so it includes more domains operated by the same CDN operator who used polyfill.io maliciously. Those are still active, and present a clear risk to users of them who do not use SRI.

Nevertheless, I moved it to an experimental query. I'd rather it was merged into the main queries for widespread visibility of code that is or was exposed to this risk, but will accept merging into experimental and escalating a move to the main queries later.

It also allows users to extend the query using a data extensions model pack.

I also changed the existing CDN query to allow the extension of the CDN list that requires an SRI more easily in a data extension file. It no longer requires escaping dots in a regex - you just add the domain to the next line in a string in a tuple in the YAML file.

Hope that motivates merging this.

aegilops avatar Jul 09 '24 17:07 aegilops

I've moved the query into the main set of queries, and slightly updated the changes-notes entry. Nothing practical changed, so 🤞 it passes cleanly in the new location

aegilops avatar Jul 11 '24 10:07 aegilops

I did some evaluations, and they look good: (nightly, default (internal links)).
There are some TPs that I like.
Including a large webpage that previously used polyfill.io (but they have since changed to use the cloudfare hosted copy).

There is a lot of documentation here, and I would like some eyes from the doc team, but the PR is good to merge after that. (The doc team gets pinged with the label I just added).
I'm on PTO the next few weeks, but you can ask in #codeql-dynamic for someone else to press the merge button.
Tell them that Erik said it's OK to merge.

erik-krogh avatar Jul 12 '24 08:07 erik-krogh

@felicitymay thanks so much for the suggestions!

I transferred my note about using SRI with dynamic services into the other query help too. It's the same wording in both places now.

I thought about usability and added a note on CUSTOMIZING.md into a separate file from the query help, and referenced its existence in the query help.

It describes how someone can extend the queries with data extensions, to add their own hostname that needs SRI, or to add their own untrusted domain.

aegilops avatar Jul 12 '24 11:07 aegilops

Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?

I'll definitely accept your suggestions when I'm back at my IDE, but I don't really want to slow down merging this to get this customisation docs part perfect.

Sound OK, or no go?

aegilops avatar Jul 12 '24 12:07 aegilops

Can I propose using the current in-repo markdown approach, and coming back after next week with a second PR to merge it into CodeQL docs, to improve things?

Happy to accept this approach. In that case, can we link directly to where the new file will be, that is: https://github.com/github/codeql/blob/main/javascript/ql/src/Security/CWE-830/CUSTOMIZING.md instead of just mentioning the file name?

felicitymay avatar Jul 12 '24 12:07 felicitymay