vite icon indicating copy to clipboard operation
vite copied to clipboard

Assets with the same content are de-duplicated at build time

Open hemengke1997 opened this issue 2 years ago • 13 comments

Describe the bug

Hi! I have many assets like svg and their [content] is the same. After vite build, I only got two svg. The rest have been merged together. I checked the code of vite in https://github.com/vitejs/vite/blob/2b7e836f84b56b5f3dc81e0f5f161a9b5f9154c0/packages/vite/src/node/plugins/asset.ts#L325. Is this made wrong? Or is this the default behavior of rollup. How can I get all assests after build? I don't want chunk splitting.

Reproduction

https://stackblitz.com/edit/vitejs-output-file-paths-gftt9h?file=dist/manifest.json

System Info

vite 2.9.12

Used Package Manager

pnpm

Logs

No response

Validations

hemengke1997 avatar Jun 17 '22 07:06 hemengke1997

I found this is not rollup's behavior. due to https://github.com/vitejs/vite/blob/2b7e836f84b56b5f3dc81e0f5f161a9b5f9154c0/packages/vite/src/node/plugins/asset.ts#L325 has cached the assests by hashcontent. How can I disable this behavior?

Looking forward to your answer

hemengke1997 avatar Jun 17 '22 07:06 hemengke1997

I also think that this isn't the best here. Maybe we should hash the input file name instead. It is an interesting optimization but feels to me that people should opt into it.

patak-dev avatar Jun 17 '22 07:06 patak-dev

Isn't it a good thing that same content assets are deduped to reduce bundle size?

bluwy avatar Jun 17 '22 07:06 bluwy

Of course it is very good for reducing the bundle size in SPA. But in MPA, developers want put the files together by every webpage. If some asstes are de-duplicated, we wouldn't be able to know where the assets coming from. Moreover, the manifest maybe like this:

  "index/index.html": {
    "file": "index/index-entry-a9d46e1c.js",
    "src": "index/index.html",
    "isEntry": true,
    "imports": [
      "_plugin-vue_export-helper-chunk-21debc91.js"
    ],
    "css": [
      "index/index.css"
    ],
    "assets": [
      "subpage/images/test.png", // this is Index but it references subpage
      "index/images/1.svg"
    ]
  },
  "subpage/index.html": {
    "file": "subpage/subpage-entry-4b9aeb4f.js",
    "src": "subpage/index.html",
    "isEntry": true,
    "imports": [
      "_plugin-vue_export-helper-chunk-21debc91.js"
    ],
    "css": [
      "subpage/subpage.css"
    ],
    "assets": [
      "subpage/images/logo.png",
      "subpage/images/test.png"
    ]
  },

This made me hard to deploy

hemengke1997 avatar Jun 17 '22 07:06 hemengke1997

The possible issues I see are:

  • A user has several images with proper names (and two of them have the same content but are conceptually different). Maybe there is another process in a backend integration that grabs that images and does something with them?
  • If that isn't the case and it is safe to de-duplicate, maybe a warning (that I could turn on with a comment) could be issued so the duplicated image is also removed from the repo?

There is also a perf implication of computing this hash. Maybe it is worthy, but it was surprising to me so at least we should better document it.

patak-dev avatar Jun 17 '22 07:06 patak-dev

Thanks for the explanations. Yeah I think this makes sense to me too. It's rare for someone to have to completely equal assets, and if they did it's likely intended and we should separate them.

Regarding perf, that's a good point too. It seems like it's being used for various stuff though so it might not be easy to optimize it.

bluwy avatar Jun 18 '22 17:06 bluwy

Is there a reason this is closed? I think the issue is valid.

bluwy avatar Jun 27 '22 03:06 bluwy

Sorry, my bad. I thought this issue was unnecessary. I'm very sorry.

hemengke1997 avatar Jun 27 '22 03:06 hemengke1997

today we also got this issue. I've checked hash in CLI - the same content, but different names

$ md5 test.png
MD5 (test.png) = e6fa9f20b661ea00978dad52102d4fa1
$ md5 test3.png
MD5 (test3.png) = e6fa9f20b661ea00978dad52102d4fa1

and there are not both files in the manifest.json. May be it is possible to calculating hashes depending not only by content, but on filename, modification date or smt else ?

Is there some temp solution for this issue ?

dostrog avatar Aug 30 '22 10:08 dostrog

Hey folks, I've hit this with the Laravel integration as well, and I've proposed the following fix: https://github.com/vitejs/vite/pull/9928

It is my opinion that if two files are the same e.g.

echo "SVG_CONTENT_HERE" > loader.svg > spinner.svg

That is a suitable solution to include the file in the build once, but populate the manifest with two entries.

{
  "loader.svg": {
    "file": "assets/loader.01234.svg",
    "src": "loader.svg"
  },
  "spinner.svg": {
    "file": "assets/loader.01234.svg",
    "spinner.svg"
  }
  // ...
}

This means that if you are compressing images it only has to compress the one image, rather than duplicate images and also that your bundle size is small i.e. one image rather than two are ouput.

@dostrog it is my personal opinion that although your issue is related, it is a unique issue.

I wonder if Vite could offer a "hash" config option that gets the file etc. and that way for the 95% use case you don't need anything, but for the 5% when clashes occur there is an escape hatch.

export default defineConfig({
    hasher: (file) => md5(file.contents) + md5(file.fileName),
    // ...
};

I'm no JS expert though, so this might be a silly idea.

timacdonald avatar Sep 01 '22 03:09 timacdonald

@dostrog it is my personal opinion that although your issue is related, it is a unique issue.

Thanks for your reply! Yeah unique and rare, but exists.

at the moment I've just add file name + generate hash (unfortunately. do not have time for full PR) and use forked repo for Vite

in asset.ts:

...
    const map = assetHashToFilenameMap.get(config)!
    const contentHash = getHash(content + file) // <- make hash more unique
    const { search, hash } = parseUrl(id)
...

dostrog avatar Sep 01 '22 06:09 dostrog

We decided today that there isn't a good use case yet to justify an option to avoid deduplication. Would you check if @timacdonald's PR is enough for your needs:

  • https://github.com/vitejs/vite/pull/9928

If not, please provide more context about the particular use case. Thanks!

patak-dev avatar Sep 09 '22 14:09 patak-dev

I've just checked with

this PR successfully resolves our issue with the same content in different files also.

Thank you guys!

dostrog avatar Sep 10 '22 06:09 dostrog

Closing as it seems https://github.com/vitejs/vite/pull/9928 was enough.

sapphi-red avatar Oct 29 '22 08:10 sapphi-red

@sapphi-red sorry for the late reply. i've checked the reproduction and update vite to lastest, and found this is not solved.

Here is the repro: https://stackblitz.com/edit/vitejs-output-file-paths-ylbeyd

hemengke1997 avatar Oct 29 '22 12:10 hemengke1997

@hemengke1997 https://github.com/vitejs/vite/issues/8632#issuecomment-1242066533 Would you provide more context about the particular use case if #9928 isn't enough?

sapphi-red avatar Oct 29 '22 12:10 sapphi-red

@sapphi-red What I hope is that the duplicated assets will not be deduplicated after bundle, and the problem solved by this pr https://github.com/vitejs/vite/pull/9928 is the manifest reference. It seems that this problem can only be solved from rollup

hemengke1997 avatar Oct 29 '22 12:10 hemengke1997

@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?

sapphi-red avatar Oct 29 '22 13:10 sapphi-red

@hemengke1997 Yes I understand the difference. But we didn't understand why you need to avoid the dedupe of the actual file. Would you provide more context about that?

I am trying to cache packed results. Because my project structure is multi-page, and each page is not directly related, the speed will be slower every time it is fully bundled, so I need know whether the file needs to be removed from the cache by controlling the hash. And multi-asset merging will cause cache instability. My need is very rare, so I don't think there is any need to modify the vite source code. At first I thought it was a vite bundle bug. But I now think merging assets is reasonable for the vast majority of projects. Thank you for your enthusiasm and responsibility

hemengke1997 avatar Oct 29 '22 13:10 hemengke1997