thebe icon indicating copy to clipboard operation
thebe copied to clipboard

Security note about XSS in the docs

Open rowanc1 opened this issue 4 years ago • 21 comments

I think it is worth noting that Thebe injects scripts and javascript into the page, and that is potentially a security threat for folks if they are using this in authorized pages (e.g. making API calls from the page). This can be known as self-xss, and could be a problem.

For example, executing:

%%html
<script>document.body.innerHTML = "";</script>

Will completely delete everything on the page. :)

I think a note about this for developers that it should be used on static webpages only, where there is no harm that can come to the user. If folks are game, I can take a stab at the wording for review.

rowanc1 avatar Oct 17 '20 17:10 rowanc1

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

welcome[bot] avatar Oct 17 '20 17:10 welcome[bot]

This sounds like a good addition. Go for it.

moorepants avatar Oct 17 '20 17:10 moorepants

On it! 👍

rowanc1 avatar Oct 17 '20 17:10 rowanc1

Lol that example is hilarious

I kind of wonder if we should sanitize certain kinds of outputs

choldgraf avatar Oct 17 '20 17:10 choldgraf

I think there are some other ways to do this as the library evolves, sandboxed iframes for example. But it in much more complicated. :)

We are thinking about this a lot at Curvenote, because we are in an authorized environment.

rowanc1 avatar Oct 17 '20 17:10 rowanc1

Yeah - definitely something to think about. An iframe wouldn't be that much extra work would it? I wonder if juniper has any safety hatches in this way

choldgraf avatar Oct 17 '20 18:10 choldgraf

It actually is a bit more work, because the iframe needs to be hosted on a different domain than the docs. This is why a lot of sites have githubusercontent.com etc. You then need to communicate with the iframe with a post-message in our case (or have a stateful server side component, which is way more complicated).

rowanc1 avatar Oct 17 '20 18:10 rowanc1

Ok I take it back that is much more complicated lol. We could try some hackier sanitation like special-casing %%html and other common ways to inject arbitrary html. Since we have access to the contents of the cell before it is sent to the kernel and the same on the return trip

choldgraf avatar Oct 17 '20 18:10 choldgraf

I also think that this is the power of thebe? Similar to Jupyter which really doesn't stop you from doing this either.

It is (I think) completely safe if run on a static site like public RTDs. I just don't want someone dropping this into their app and then having some weird thing and realizing it was thebe!

There are some of the components of thebe, like kernel management etc, that might be nice to have access to in a different way. Need to think on that a bit more. :)

rowanc1 avatar Oct 17 '20 18:10 rowanc1

For this security issue, does it rely on a bad-acting author writing the XSS code, or are there other ways that this could be carried out? This affects to what extent this is an administrative problems versus something that must be handled at a technical/code level.

One possible solution I was thinking of is that Thebe optionally strips %%html and other insecure prompts based on Thebe's config. Deciding whether or not this should be secure/stripped be default or insecure/backwards-compatible I do not know.

Miniland1333 avatar Oct 17 '20 19:10 Miniland1333

It actually is a bit more work, because the iframe needs to be hosted on a different domain than the docs. This is why a lot of sites have githubusercontent.com etc. You then need to communicate with the iframe with a post-message in our case (or have a stateful server side component, which is way more complicated).

In case this is relevant, you can programmatically instantiate new, arbitrary iframes in HTML/Javascript instead of using src.

var ifrm = document.getElementById('myIframe');
ifrm = ifrm.contentWindow || ifrm.contentDocument.document || ifrm.contentDocument;
ifrm.document.open();
ifrm.document.write('Hello World!');
ifrm.document.close();
//example from https://stackoverflow.com/questions/997986/write-elements-into-a-child-iframe-using-javascript-or-jquery

Since this is considered same-site, you have much greater control over the iframe and can freely interact with the document. https://developer.mozilla.org/en-US/docs/Web/API/HTMLIFrameElement/contentDocument

Miniland1333 avatar Oct 17 '20 20:10 Miniland1333

I think that the "self-xss" is the biggest problem. Where someone copies something in, doing something they don't expect. If you open up the console on any google or facebook site, there are usually warnings about this. :)

Do you know off hand if the child can access the parent at all in that case of creating an iframe? However, if it is on the same site, then some of the XSS concerns are still valid (I think?!), for example if you have a cookie used in an API request. That iframe could act as you.

rowanc1 avatar Oct 17 '20 20:10 rowanc1

I think that the "self-xss" is the biggest problem. Where someone copies something in, doing something they don't expect. If you open up the console on any google or facebook site, there are usually warnings about this. :)

Do you know off hand if the child can access the parent at all in that case of creating an iframe? However, if it is on the same site, then some of the XSS concerns are still valid (I think?!), for example if you have a cookie used in an API request. That iframe could act as you.

Some quick Googling/Stack overflowing indicates you are correct in that while an iframe prevents normal XSS attacks from working, if the script instead explicitly access the parent's DOM then it can wreak havoc. So stealPasswords(document) wouldn't work, but stealPasswords(parent.document) would.

For a technical fix, one possible solution I was thinking of is that Thebe optionally strips %%html and other insecure prompts based on Thebe's config. If this option was set, it would prevent this injection from working unless the user tampered with the Thebe config as well (at which point they probably should know better). Deciding whether or not this should be secure/stripped by default or insecure/backwards-compatible I do not know.

Miniland1333 avatar Oct 17 '20 20:10 Miniland1333

This closed automatically due to the linked PR. I think there are maybe a few more things to discuss, but that could potentially happen on a new issue on how to deal with security.

My take is: don't deal with security in thebe! It works really well, is safe on non-authorized pages, and is consistent with Jupyter.

I think there might be another issue to bring up like, "how to use thebe in a secure environment" or something. That might require splitting the library into a few components though! Like something that deals with kernel connections and then something that renders the outputs to output areas.

rowanc1 avatar Oct 17 '20 21:10 rowanc1

I'll re-open for now in case there are other major main points to discuss, since some nice ideas have already been thrown around in this issue 👍

choldgraf avatar Oct 17 '20 22:10 choldgraf

Actually, this line might be where we can handle this?

let model = new OutputAreaModel({ trusted: true });

Turning that to false, starts to kill all the interactivity. Maybe someone from the core team can chime in and let us know if that is the case, if so -- we should turn this into a config option!

This is what it looks like with safe on, notice the last one --- no html injected! image

rowanc1 avatar Oct 17 '20 23:10 rowanc1

While that appears to work from your photo and it should likely be made into a config option, I have a feeling it may be too restrictive to cover all use cases. I wish there was a config option to just disable %%html and %%javascript since those provide the most unprotected injection point for abuse.

Miniland1333 avatar Oct 17 '20 23:10 Miniland1333

I think there are some other ways to do this as the library evolves, sandboxed iframes for example. But it in much more complicated. :)

We are thinking about this a lot at Curvenote, because we are in an authorized environment.

This is a tangential question for @rowanc1: Specifically for Curvenote in an "authorized environment", do you think that code signing (generating a cryptographic signature or integrity hash) would be a useful security feature long-term?

I can imagine a scenario for Curvenote where the code blocks are cryptographically signed during authoring so that while public users can see the code and the server is publicly-accessible, they can ONLY run code authored with a valid cryptographic signature. This would help prevent many possible forms of arbitrary-code abuse.

For LibreTexts (an open textbook initative I work with), this makes sense as we have a controlled authoring environment and jupyterhub, but until we have signing we have no control over what code users send from Thebe as they can freely modify it.

Miniland1333 avatar Oct 17 '20 23:10 Miniland1333

Interesting idea on the hashing. I think we are probably closer to the open-textbook case at Curvenote - we have an initial notebook that we then want to let people play with around the edges to modify outputs. So because we want to allow them to change the code, the hashing won't work. :)

Another security model is voila, I think they lock down the protocol to be more what you are talking about -- only allowing widget changes to propagate. A bit different than the aims that thebe has I think.

rowanc1 avatar Oct 18 '20 00:10 rowanc1

Actually just a note that I think it'd be really useful to make it possible to use thebe for running code that is immutable, displaying the outputs w/o the code, and connecting with q kernel purely for widgets. Similar to voila

choldgraf avatar Oct 18 '20 01:10 choldgraf

Should we rename this issue or open another one, specifically around the configuration option (or something like it) suggested by @Miniland1333 where we can run in a safe mode where, potentially unsafe outputs are intercepted / disabled somehow? from reading the thread above that seems to be the best next step.

Also I'm turning @choldgraf's last comment into a separate issue, I think that is very valuable and something separate

stevejpurves avatar Jul 19 '21 10:07 stevejpurves