vite
vite copied to clipboard
Assets with the same content are de-duplicated at build time
Describe the bug
Hi! I have many assets like svg and their [content] is the same. After vite build, I only got two svg. The rest have been merged together. I checked the code of vite in https://github.com/vitejs/vite/blob/2b7e836f84b56b5f3dc81e0f5f161a9b5f9154c0/packages/vite/src/node/plugins/asset.ts#L325. Is this made wrong? Or is this the default behavior of rollup. How can I get all assests after build? I don't want chunk splitting.
Reproduction
https://stackblitz.com/edit/vitejs-output-file-paths-gftt9h?file=dist/manifest.json
System Info
vite 2.9.12
Used Package Manager
pnpm
Logs
No response
Validations
- [X] Follow our Code of Conduct
- [X] Read the Contributing Guidelines.
- [X] Read the docs.
- [X] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- [X] Make sure this is a Vite issue and not a framework-specific issue. For example, if it's a Vue SFC related bug, it should likely be reported to https://github.com/vuejs/core instead.
- [X] Check that this is a concrete bug. For Q&A open a GitHub Discussion or join our Discord Chat Server.
- [X] The provided reproduction is a minimal reproducible example of the bug.
I found this is not rollup's behavior. due to https://github.com/vitejs/vite/blob/2b7e836f84b56b5f3dc81e0f5f161a9b5f9154c0/packages/vite/src/node/plugins/asset.ts#L325 has cached the assests by hashcontent. How can I disable this behavior?
Looking forward to your answer
I also think that this isn't the best here. Maybe we should hash the input file name instead. It is an interesting optimization but feels to me that people should opt into it.
Isn't it a good thing that same content assets are deduped to reduce bundle size?
Of course it is very good for reducing the bundle size in SPA. But in MPA, developers want put the files together by every webpage. If some asstes are de-duplicated, we wouldn't be able to know where the assets coming from. Moreover, the manifest maybe like this:
"index/index.html": {
"file": "index/index-entry-a9d46e1c.js",
"src": "index/index.html",
"isEntry": true,
"imports": [
"_plugin-vue_export-helper-chunk-21debc91.js"
],
"css": [
"index/index.css"
],
"assets": [
"subpage/images/test.png", // this is Index but it references subpage
"index/images/1.svg"
]
},
"subpage/index.html": {
"file": "subpage/subpage-entry-4b9aeb4f.js",
"src": "subpage/index.html",
"isEntry": true,
"imports": [
"_plugin-vue_export-helper-chunk-21debc91.js"
],
"css": [
"subpage/subpage.css"
],
"assets": [
"subpage/images/logo.png",
"subpage/images/test.png"
]
},
This made me hard to deploy
The possible issues I see are:
- A user has several images with proper names (and two of them have the same content but are conceptually different). Maybe there is another process in a backend integration that grabs that images and does something with them?
- If that isn't the case and it is safe to de-duplicate, maybe a warning (that I could turn on with a comment) could be issued so the duplicated image is also removed from the repo?
There is also a perf implication of computing this hash. Maybe it is worthy, but it was surprising to me so at least we should better document it.
Thanks for the explanations. Yeah I think this makes sense to me too. It's rare for someone to have to completely equal assets, and if they did it's likely intended and we should separate them.
Regarding perf, that's a good point too. It seems like it's being used for various stuff though so it might not be easy to optimize it.
Is there a reason this is closed? I think the issue is valid.
Sorry, my bad. I thought this issue was unnecessary. I'm very sorry.
today we also got this issue. I've checked hash in CLI - the same content, but different names
$ md5 test.png
MD5 (test.png) = e6fa9f20b661ea00978dad52102d4fa1
$ md5 test3.png
MD5 (test3.png) = e6fa9f20b661ea00978dad52102d4fa1
and there are not both files in the manifest.json. May be it is possible to calculating hashes depending not only by content, but on filename, modification date or smt else ?
Is there some temp solution for this issue ?
Hey folks, I've hit this with the Laravel integration as well, and I've proposed the following fix: https://github.com/vitejs/vite/pull/9928
It is my opinion that if two files are the same e.g.
echo "SVG_CONTENT_HERE" > loader.svg > spinner.svg
That is a suitable solution to include the file in the build once, but populate the manifest with two entries.
{
"loader.svg": {
"file": "assets/loader.01234.svg",
"src": "loader.svg"
},
"spinner.svg": {
"file": "assets/loader.01234.svg",
"spinner.svg"
}
// ...
}
This means that if you are compressing images it only has to compress the one image, rather than duplicate images and also that your bundle size is small i.e. one image rather than two are ouput.
@dostrog it is my personal opinion that although your issue is related, it is a unique issue.
I wonder if Vite could offer a "hash" config option that gets the file etc. and that way for the 95% use case you don't need anything, but for the 5% when clashes occur there is an escape hatch.
export default defineConfig({
hasher: (file) => md5(file.contents) + md5(file.fileName),
// ...
};
I'm no JS expert though, so this might be a silly idea.
@dostrog it is my personal opinion that although your issue is related, it is a unique issue.
Thanks for your reply! Yeah unique and rare, but exists.
at the moment I've just add file name + generate hash (unfortunately. do not have time for full PR) and use forked repo for Vite
in asset.ts:
...
const map = assetHashToFilenameMap.get(config)!
const contentHash = getHash(content + file) // <- make hash more unique
const { search, hash } = parseUrl(id)
...
We decided today that there isn't a good use case yet to justify an option to avoid deduplication. Would you check if @timacdonald's PR is enough for your needs:
- https://github.com/vitejs/vite/pull/9928
If not, please provide more context about the particular use case. Thanks!
I've just checked with
this PR successfully resolves our issue with the same content in different files also.
Thank you guys!
Closing as it seems https://github.com/vitejs/vite/pull/9928 was enough.
@sapphi-red sorry for the late reply. i've checked the reproduction and update vite to lastest, and found this is not solved.
Here is the repro: https://stackblitz.com/edit/vitejs-output-file-paths-ylbeyd
@hemengke1997 https://github.com/vitejs/vite/issues/8632#issuecomment-1242066533 Would you provide more context about the particular use case if #9928 isn't enough?
@sapphi-red What I hope is that the duplicated assets will not be deduplicated after bundle, and the problem solved by this pr https://github.com/vitejs/vite/pull/9928 is the manifest reference. It seems that this problem can only be solved from rollup
@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?
@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?
I am trying to cache packed results. Because my project structure is multi-page, and each page is not directly related, the speed will be slower every time it is fully bundled, so I need know whether the file needs to be removed from the cache by controlling the hash
. And multi-asset merging will cause cache instability.
My need is very rare, so I don't think there is any need to modify the vite source code. At first I thought it was a vite bundle bug. But I now think merging assets is reasonable for the vast majority of projects.
Thank you for your enthusiasm and responsibility