DOMPurify Need to block external calls, e.g. all HTTP requests

Background & Context

Need to block all direct server loads, i.e. any parts of the HTML that trigger any HTTP or server requests on rendering, without user interaction. Normal links like <a href=""> which activate only on user click should stay.

Why:

When dealing with untrusted HTML, HTTP calls triggered by it can be a major problem, depending on use case: If the HTTP call is to the same site as the target of the HTML injection, it may be a security problem, if the server doesn't protect itself against it.
Avoid data leaks and unintentional data triggers or exflitration (third party). E.g. if I allow web forum users to post HTML sniplets, I do not want them to get a HTTP ping including IP address and time of reading from every reader of the post on my web forum. Similarly, when I sanitize an email, I need to filter outgoing HTTP calls, to prevent spammers from getting receive or read notifications, or even IP addresses and the times when a message was read.

Bug

Input

<img src="">
<img srcset="">
<video src="">
<video><source>
<svg><g>
<link> preload
and many more

CSS

@import
url(), some samples from sanitize CSS demo, but there are more:
- list-style: url(https://leaking.via/css-list-style);
- list-style-image: url(https://leaking.via/css-list-style-image);
- background: url(https://leaking.via/css-background);
- background-image: url(https://leaking.via/css-background-image);
- border-image: url(https://leaking.via/css-border-image);
- border-image-source: url(https://leaking.via/css-border-image-source);
- shape-outside: url(https://leaking.via/css-shape-outside);
- cursor: url(https://leaking.via/css-cursor), auto;
- svg circle
  - mask: url(https://leaking.via/svg-css-mask#foo);
  - filter: url(https://leaking.via/svg-css-filter#foo);
  - clip-path: url(https://leaking.via/svg-css-clip-path#foo);

and tons and tons of others.

Some are not even HTML tags nor attributes nor CSS values.

Given output

URL stays in sanitized HTML output, triggering direct HTTP loads on rendering.

Expected output

All URLs that would be loaded directly are removed from the HTML. When rendering the sanitized HTML, no outgoing calls are made.

Non-working solution

https://github.com/cure53/DOMPurify/blob/main/demos/hooks-link-proxy-demo.html has example code, but that replaces only 3 specific attributes. However, on the web platform, there is a huge amount of features that all trigger server requests (see above for a very small and incomplete subset). There are constantly new ways added to the HTML platform, some are non-standard and experimental.

It is practically impossible for an individual app to keep up with all these. This list needs to be centrally managed by a library.

Feature

Add a feature switch that removes a URLs that would trigger a direct load on rendering, without user interaction. Maintain links that activate only on user interaction/click. (Of course, retain all other sanitization features, including JS code removal, XSS removal etc.)

May 07 '24 19:05 benbucksch

Heya, with this neither being part of our threat model nor what we believe is achievable with a library like ours (given the problems you mentioned), I don't think we will work on a feature like this anytime soon without additional input or help.

I think what first needs to be done is to find out the following:

Can we even do this and, if so, how?

Would CSP be the answer, i.e. injecting an inline policy to block all outgoing request? Or would a list of known request emitting elements and attributes as well as CDATA be the way to go?

What are your thoughts, how would this best be approached?

May 08 '24 07:05 cure53

As with everything in security, I would go with a multiple approach protection:

Try to capture all tags (like <video>) and attribute names that have URLs and remove or replace them. I would start with composing a (long) list of tag and attribute names, and then blacklist them.
Your proxy demo has a few good approaches, e.g. to detect in CSS all url() values and remove or replace them. In HTML, we could detect any property value starting with "http:" or "https:" and remove or replace it.
Find a way to completely block outgoing calls, even if they slip through and end up in the HTML. CSP is a great idea, if this can be done. Can you make a proposal what you have in mind there?

May 08 '24 12:05 benbucksch

We do in fact have a project that once attempted to catch them all, here it is:

https://github.com/cure53/HTTPLeaks

However, this does not automatically cover new ways of leaking HTTP requests, so it will have to be actively maintained and such approach might be very prone to bypasses at first until it matures.

CSP is a great idea, if this can be done. Can you make a proposal what you have in mind there?

My thinking was, simply inject a META tag into every sanitized result that disallows anything to be requested unless it's same origin - or even nothing at all. This can already be done just so, by simply using a hook and injecting the META tag.

Oh, and one important bit of info, I will not be working on this implementation at all, I do not have time for this - but I am very open to reviewing designs, ideas, and pull requests. Just to clarify early on :slightly_smiling_face:

May 08 '24 12:05 cure53

I think this should be quite close to what you need, correct? It's a (naive and very bad) implementation of a toggle for fetching content or not using CSP. I chose a sandboxed iframe with srcdoc attribute, sanitize with default settings and simply inject the right CSP policy depending on what the user chose.

<!doctype html>
<html>
    <head>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.1.2/purify.min.js"></script>
    </head>
    <body>
        <!-- Our IFRAME to receive content -->
        <iframe sandbox srcdoc id="sanitized"></iframe>

        <p>
            By default, nothing will be fetched, click button to toggle fetch on or off (see location.hash)
        </p>
        <p>
            <button onclick="location.hash ? location.hash = '' : location.hash = 'yes'">Fetch content?</button>
            <button onclick="location.reload();">Reload page</button>
        </p>

        <!-- Now let's sanitize that content -->
        <script>
            'use strict';
            
            // Specify dirty HTML
            const dirty = `<body><img src=https://cure53.de/img/menu/cure_53_logo.svg><p>HELLO<iframe/\/src=JavScript:alert&lpar;1)></ifrAMe><br>goodbye</p>`;
            
            // Specify strict inline CSP policy
            let csp = ``;
            
            if (location.hash.match(/yes/)) {
                csp = `<meta http-equiv="Content-Security-Policy" content="default-src *">`;
            } else {
                csp = `<meta http-equiv="Content-Security-Policy" content="default-src 'none'">`;
            }
            
            // Clean HTML string and write into the IFRAME
            const clean = DOMPurify.sanitize(dirty);
            sanitized.srcdoc = csp + clean;
        </script>
    </body>
</html>

May 11 '24 11:05 cure53

@cure53 with inserted <code><meta http-equiv="Content-Security-Policy" content="default-src 'none'"></code> into the doc is a nice solution, thank you! Obviously, I must first filter the document's own <code><meta></code> tags and forbid JavaScript, so that the document cannot reverse it, given that the CSP is inside the doc itself. But then it should work.

Leaves the case where I need only a part of the page to be sanitized, e.g. in a web forum or conversation, or a "Description" field in a page and I want it to size to the content. "seamless" iframes (sizing to their content) would be perfect, but unfortunately they were removed. Leaves the hacks that resize the iframe based on its content using JS, but browser security makes it hard, too (I have to reach into that iframe with a different origin).

There's a similar discussion in the sanitizer API in #228.

May 15 '24 13:05 benbucksch

Leaves the case where I need only a part of the page to be sanitized

Would that no be doable using the IN_PLACE config or by working with nodes directly?

May 18 '24 09:05 cure53

Closing this for now, as there no action planned.

Jun 05 '24 15:06 cure53