thebe
thebe copied to clipboard
Security note about XSS in the docs
I think it is worth noting that Thebe injects scripts and javascript into the page, and that is potentially a security threat for folks if they are using this in authorized pages (e.g. making API calls from the page). This can be known as self-xss, and could be a problem.
For example, executing:
%%html
<script>document.body.innerHTML = "";</script>
Will completely delete everything on the page. :)
I think a note about this for developers that it should be used on static webpages only, where there is no harm that can come to the user. If folks are game, I can take a stab at the wording for review.
Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:
This sounds like a good addition. Go for it.
On it! 👍
Lol that example is hilarious
I kind of wonder if we should sanitize certain kinds of outputs
I think there are some other ways to do this as the library evolves, sandboxed iframes for example. But it in much more complicated. :)
We are thinking about this a lot at Curvenote, because we are in an authorized environment.
Yeah - definitely something to think about. An iframe wouldn't be that much extra work would it? I wonder if juniper has any safety hatches in this way
It actually is a bit more work, because the iframe needs to be hosted on a different domain than the docs. This is why a lot of sites have githubusercontent.com
etc. You then need to communicate with the iframe with a post-message in our case (or have a stateful server side component, which is way more complicated).
Ok I take it back that is much more complicated lol. We could try some hackier sanitation like special-casing %%html and other common ways to inject arbitrary html. Since we have access to the contents of the cell before it is sent to the kernel and the same on the return trip
I also think that this is the power of thebe? Similar to Jupyter which really doesn't stop you from doing this either.
It is (I think) completely safe if run on a static site like public RTDs. I just don't want someone dropping this into their app and then having some weird thing and realizing it was thebe!
There are some of the components of thebe, like kernel management etc, that might be nice to have access to in a different way. Need to think on that a bit more. :)
For this security issue, does it rely on a bad-acting author writing the XSS code, or are there other ways that this could be carried out? This affects to what extent this is an administrative problems versus something that must be handled at a technical/code level.
One possible solution I was thinking of is that Thebe optionally strips %%html
and other insecure prompts based on Thebe's config. Deciding whether or not this should be secure/stripped be default or insecure/backwards-compatible I do not know.
It actually is a bit more work, because the iframe needs to be hosted on a different domain than the docs. This is why a lot of sites have
githubusercontent.com
etc. You then need to communicate with the iframe with a post-message in our case (or have a stateful server side component, which is way more complicated).
In case this is relevant, you can programmatically instantiate new, arbitrary iframes in HTML/Javascript instead of using src.
var ifrm = document.getElementById('myIframe');
ifrm = ifrm.contentWindow || ifrm.contentDocument.document || ifrm.contentDocument;
ifrm.document.open();
ifrm.document.write('Hello World!');
ifrm.document.close();
//example from https://stackoverflow.com/questions/997986/write-elements-into-a-child-iframe-using-javascript-or-jquery
Since this is considered same-site, you have much greater control over the iframe and can freely interact with the document. https://developer.mozilla.org/en-US/docs/Web/API/HTMLIFrameElement/contentDocument
I think that the "self-xss" is the biggest problem. Where someone copies something in, doing something they don't expect. If you open up the console on any google or facebook site, there are usually warnings about this. :)
Do you know off hand if the child can access the parent at all in that case of creating an iframe? However, if it is on the same site, then some of the XSS concerns are still valid (I think?!), for example if you have a cookie used in an API request. That iframe could act as you.
I think that the "self-xss" is the biggest problem. Where someone copies something in, doing something they don't expect. If you open up the console on any google or facebook site, there are usually warnings about this. :)
Do you know off hand if the child can access the parent at all in that case of creating an iframe? However, if it is on the same site, then some of the XSS concerns are still valid (I think?!), for example if you have a cookie used in an API request. That iframe could act as you.
Some quick Googling/Stack overflowing indicates you are correct in that while an iframe prevents normal XSS attacks from working, if the script instead explicitly access the parent's DOM then it can wreak havoc. So stealPasswords(document)
wouldn't work, but stealPasswords(parent.document)
would.
For a technical fix, one possible solution I was thinking of is that Thebe optionally strips %%html and other insecure prompts based on Thebe's config. If this option was set, it would prevent this injection from working unless the user tampered with the Thebe config as well (at which point they probably should know better). Deciding whether or not this should be secure/stripped by default or insecure/backwards-compatible I do not know.
This closed automatically due to the linked PR. I think there are maybe a few more things to discuss, but that could potentially happen on a new issue on how to deal with security.
My take is: don't deal with security in thebe! It works really well, is safe on non-authorized pages, and is consistent with Jupyter.
I think there might be another issue to bring up like, "how to use thebe in a secure environment" or something. That might require splitting the library into a few components though! Like something that deals with kernel connections and then something that renders the outputs to output areas.
I'll re-open for now in case there are other major main points to discuss, since some nice ideas have already been thrown around in this issue 👍
Actually, this line might be where we can handle this?
let model = new OutputAreaModel({ trusted: true });
Turning that to false
, starts to kill all the interactivity. Maybe someone from the core team can chime in and let us know if that is the case, if so -- we should turn this into a config option!
This is what it looks like with safe on, notice the last one --- no html injected!
While that appears to work from your photo and it should likely be made into a config option, I have a feeling it may be too restrictive to cover all use cases.
I wish there was a config option to just disable %%html
and %%javascript
since those provide the most unprotected injection point for abuse.
I think there are some other ways to do this as the library evolves, sandboxed iframes for example. But it in much more complicated. :)
We are thinking about this a lot at Curvenote, because we are in an authorized environment.
This is a tangential question for @rowanc1: Specifically for Curvenote in an "authorized environment", do you think that code signing (generating a cryptographic signature or integrity hash) would be a useful security feature long-term?
I can imagine a scenario for Curvenote where the code blocks are cryptographically signed during authoring so that while public users can see the code and the server is publicly-accessible, they can ONLY run code authored with a valid cryptographic signature. This would help prevent many possible forms of arbitrary-code abuse.
For LibreTexts (an open textbook initative I work with), this makes sense as we have a controlled authoring environment and jupyterhub, but until we have signing we have no control over what code users send from Thebe as they can freely modify it.
Interesting idea on the hashing. I think we are probably closer to the open-textbook case at Curvenote - we have an initial notebook that we then want to let people play with around the edges to modify outputs. So because we want to allow them to change the code, the hashing won't work. :)
Another security model is voila, I think they lock down the protocol to be more what you are talking about -- only allowing widget changes to propagate. A bit different than the aims that thebe has I think.
Actually just a note that I think it'd be really useful to make it possible to use thebe for running code that is immutable, displaying the outputs w/o the code, and connecting with q kernel purely for widgets. Similar to voila
Should we rename this issue or open another one, specifically around the configuration option (or something like it) suggested by @Miniland1333 where we can run in a safe mode where, potentially unsafe outputs are intercepted / disabled somehow? from reading the thread above that seems to be the best next step.
Also I'm turning @choldgraf's last comment into a separate issue, I think that is very valuable and something separate