nuxt-security
nuxt-security copied to clipboard
Script injection vulnerability in combination with useHead
Environment
- Operating System: Linux
- Node Version: v18.20.3
- Nuxt Version: 3.16.0
- CLI Version: 3.23.0
- Nitro Version: 2.11.6
- Package Manager: [email protected]
- Builder: -
- User Config: compatibilityDate, devtools, modules
- Runtime Modules: [email protected]
- Build Modules: -
Nuxt Security Version
v2.2.0
Default setup used?
Yes, the bug happens even if the security option is not customized
Security options
Reproduction
- open https://stackblitz.com/edit/nuxt-starter-94vb5abo?file=app.vue,nuxt.config.ts
- it's a default Nuxt project with default nuxt-security config
- in app.vue, the string
maliciousUserInputcontains two script tags - this string is used as value for
useHead/useHeadSafelink/meta tags
Description
- when nuxt-security is turned off, the script is not executed as it's just a string inside of a meta tag attribute value (
<meta property="og:title" content="<script></script><script>alert(1)</script>">). All good. - but when enabling nuxt-security, it adds
nonceattributes which then result in<meta property="og:title" content="<script nonce="xXNuFdmH0kSn/o6NLkUvBI1S"></script><script nonce="xXNuFdmH0kSn/o6NLkUvBI1S">alert(1)</script>">; hence it terminates thecontentattribute and the script injection is successful. When opening the page, an alert is shown - which should not happen. - using
useHeadSafeonly helps partly, it prevents the script injection on the<link>tag but the one in<meta>tag still persists
On first thought, I see two solutions:
- make nuxt-security nonce logic smarter to not replace
- adjust
useHead/useHeadSafeto encode<>characters so that nuxt-security's logic won't match anymore
What do you think?
Additional context
No response
Logs
Hey there,
Thanks for reporting this issue. Interesting case to be sure!
@vejja do you have some ideas why this happends? :)
Hi all This is essentially the same problem as #594 @GalacticHypernova would you by any chance have some time to modify the regex?
@Baroshem
The issue is on our side, and is related to the regex mentioned in #594.
Basically we are replacing all <script> elements without verifying if they are valid HTML elements.
If you remember, we used to rely on Cheerio, which correctly parses the DOM element by element, but it was too resource-intensive, especially in workers.
We moved to regex-based parsing, and our regex is too lax
The correct solution is as suggested by @jschroeter:
- make nuxt-security nonce logic smarter to not replace
This issue is indeed the downside of having regex based parsing, but I wouldn't necessarily say it's an issue of the module.
Having regex based parsing ensures valid scripts are captured:
<div>"<script>alert(1)</script>"</div>
But it is also stateless, and can't know the context of the match. I could look into modifying the regex a bit, but its complexity will undoubtedly increase, by a margin dependent on how the build process would parse it in different configurations.
For example, if one disables minifications, and the build process bundles it as is, there may be a need to handle multiline attribute patterns to capture the following and correctly assess that it's safe:
<div
class="someClass"
someAttr="<script>alert(1)</script>"
>
There could potentially be other such edge cases that would require the pattern to grow in complexity if we want to accurately handle it all, but I believe there's also the question of how much we can truly do all the work for users.
I believe everyone knows that placing script tags in any tag is bad practice, even if it's a simple string. We mustn't forget that the more complex the patterns are, the more expensive and resource-intensive they become. Just like if developers use direct v-html at runtime on unsanitized user input, there are certain standards that any developer should follow as it is infeasible for an automated algorithm to capture all potential quirks.
That being said, if you'd like me to look into trying to improve the regex, let me know!
I’m wondering whether we should modify our algorithm.
Instead of trying to craft the perfect regex (which, as @GalacticHypernova rightly says, will exponentially grow in complexity), why don’t we parse the input string recursively until we find a <script> tag ?
The logic could be:
- use regex to find any pair of matching opening and closing tags (carefully detecting self-closing elements)
- move inside the inner text inside that pair
- start again recursively until we find a script
That logic would mimic the standard strategy of an html parser, eliminating false positives like the one evidenced in this issue. In addition, it would be more readable and maintainable than a greedy recursive regex.
What do you guys think ?
Thanks a lot for looking into it asap and proposing a fix - and also for keeping an eye on performance!
Just my 2 cents on...
I believe everyone knows that placing script tags in any tag is bad practice, even if it's a simple string.
Yes and no :-) Obviously it's bad practice but it can happen with user generated content. Now we can argue that the dev has to manually sanitise all input, which is kind of what the Unhead docs suggests. But when reading the Nuxt docs it sounds like simply using useHeadSafe is fine, like the name implies. We could open an issue there and suggest that useHeadSafe should encode < and >? But actually there is no problem with those characters as long as you don't use nuxt-security.
I wouldn't necessarily say it's an issue of the module
Like said, there is no issue if the module isn't used. People add nuxt-security to improve security but in this case ironically a new issue occurs. So if possible, I think it would make most sense to fix it here.
Let me know when I can help e.g. by trying it out.
Hi @jschroeter Unfortunately the PR is only a very preliminary draft. I asked Copilot to propose a fix and it kind of understood what the intent was but it is not addressing the issue. I’m short of time right now, obviously if you want to give it a shot it would be wonderful. I’ll try to find time later to fix
happen with user generated content
I get that, but ideally this situation should never happen. It's the same concept as using v-html, which directly bypasses all security restrictions. It's just something that shouldn't happen in the first place.
This issue has 2 sides:
- Using regex makes it far more memory efficient, which is critical for constrained environments like serverless or cloud infrastructure that bills on usage. But it also goes off of the assumption that the entire HTML is controlled by the developer, and puts trust that the developer follows web standards. Due to it being stateless, it can't reliably know the context of the tag without becoming exponentially resource-intensive and thus negating the very benefit it came to utilize.
- Not using regex (i.e. actually parsing the HTML) address the stateless downside of regex, and provides increased accuracy, but becomes a bottleneck in constrained environments.
If I look at this at an overview, the impact of using a parser is far worse than that of the stateless regex. Many people use serverless deployments, far more than those who create html based on user content. By addressing the accuracy concern, we will indirectly introduce a critical bottleneck that will render many serverless-deployed apps unusable, and thus impact a lot more developers.
This situation isn't really easy to solve, and it requires some careful planning. If there can't be a thorough enough regex while maintaining performance, it would be a serious risk to try and use a parser or something else that's far more memory intensive.
I have been experimenting a bit with regexes but I am yet to find a solution, I'll keep looking though.
Hey!
I sincerely apologize for the amount of time it took me to come up with a solution to this.
Right about the time I made the previous iteration, I've gotten extremely busy with my personal life and had practically no time to attend to this properly.
I've made a PR to address this issue #658