Vite fails to inject assets into the correct place in the document.
Describe the bug
The Problem
I am trying to build my project, but Vite is injecting my scripts and css into a string literal within an inline script instead of inside the real
block. This is clearly happening because the string literal contains the first instance of the string</head> and assumes this is the end of the head block. The reproduction URL has a very similar case where it is injecting into a comment instead of a string literal, but the problem is the same.
The Cause
In the file packages/vite/src/node/plugins/html.ts, starting at line 1467 is a section on injecting into various parts of the HTML document. It is evident right away that this is using regex to identify the correct location to inject. It is widely known that HTML and RegEx don't mix particularly well. To make matters more difficult, RegEx in JS does not support variable length look behind assertions. Because of the limitations of RegEx, especially in JS, RegEx is not a robust solution for identifying the end of a head block, or really any other location in an HTML document that relies on understanding HTML structure.
Solution
Without a reliable way to identify the correct location using RegEx, my recommendation is to instead use JSDOM. This is likely a less performant solution, but it would be more robust and reliable. If performance is a concern, a flag could be added to the config to use the RegEx version instead. In my case lower performance would be an acceptable trade off for correctness.
Reproduction
https://stackblitz.com/edit/vitejs-vite-fly8vq?file=index.html
Steps to reproduce
The problem will be reproduced any time the document has the string </head> above the real closing tag to the head block.
run vite build in the terminal at the reproduction URL and compare the index.html in the build to the source index.html. You will see the script tag is inserted in the comment instead of the actual html.
System Info
Though it is likely irrelevant in this case:
Windows 10, vite, vite-plugin-inline-source, vite-plugin-minify
Used Package Manager
npm
Logs
No response
Validations
- [x] Follow our Code of Conduct
- [x] Read the Contributing Guidelines.
- [x] Read the docs.
- [x] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- [x] Make sure this is a Vite issue and not a framework-specific issue. For example, if it's a Vue SFC related bug, it should likely be reported to vuejs/core instead.
- [x] Check that this is a concrete bug. For Q&A open a GitHub Discussion or join our Discord Chat Server.
- [x] The provided reproduction is a minimal reproducible example of the bug.
I just ran into this issue when setting up a project and had some commented out code that included </head> above the 'real' and uncommented out </head>
For an even sillier example that breaks: <!doctype html><title>The <header> element</title>
Vite will mistake the text <header> inside that title for a <head> tag.
I alleviated a similar issue a while back in hugo and found the same problem in VS Code a couple weeks ago. Haven't found the time make a PR there or here yet, but feel free to steal the code I wrote there.
The problem is threefold:
- Using regular expressions to naively search for tags within an arbitrary HTML fragment is impossible. Many constructs can be nested and hold arbitrary text that looks like tags, so determining if a piece of code is actually a tag is impossible without context.
- Beyond that, using regular expressions in the form
/<tagname[^>]*>/to match entire tags isn't reliable. This leads to injections after<header>instead of<head>or inside an attribute value that contains a>. - Scanning through the whole document for a specific injection point, then falling back to searching for another injection point, is inefficient and increases the risks of false positives.
I recommend scanning the document from the start, and only reading past whitespace, comments, the doctype, the html tag and the head tag. HTML comments cannot be nested so they can be consumed reliably, and the risk of encountering a breaking attribute on the <html> or <head> tags are low.