fix(legacy): improve deterministic build of the polyfill bundle by sorting the polyfills discovered by babel
Description
The polyfill bundle generated by the legacy plugin is non deterministic, because the order of calling renderChunk() affects the input of the finally generated polyfill bundle.
This PR solves this issue by sorting the polyfills in need discovered by babel. The polyfills added directly from the plugin configuration is kept ordered the first, same as before.
Previously, https://github.com/vitejs/vite/issues/13672#issuecomment-1784594387 mentioned this issue.
Run & review this pull request in StackBlitz Codeflow.
Could we sort (and convert to array for) the polyfills before these two spots instead?
https://github.com/vitejs/vite/blob/232265783670563e34cf96240bf0e383a3653e6c/packages/plugin-legacy/src/index.ts#L269-L273
https://github.com/vitejs/vite/blob/232265783670563e34cf96240bf0e383a3653e6c/packages/plugin-legacy/src/index.ts#L306-L310
We can then pass the sorted array to
buildPolyfillChunk()below them and it should be deterministic?
Hi, thanks for your review. It could be simpler to directly sort the array here. However, I think the ordering of polyfill groups should be kept as before. Originally, the corejs polyfills specified in options.{modern,legacy}Polyfills comes the first, followed by those specified in additional{Modern,Legacy}Polyfills, and those discovered by babel. If we directly sort the combined polyfills list, this ordering will be broken, and some projects may rely on this to work properly. This proposed change keeps this ordering, and only sort the polyfills generated by babel.
Also some issues with the CI. I found that the CI sometimes fails on tests not related with legacy plugin.
See:
- https://github.com/vitejs/vite/actions/runs/8898734707
- https://github.com/vitejs/vite/actions/runs/8898145152
- https://github.com/vitejs/vite/actions/runs/8897869249
They should be running on the same changes, but I have no idea why some of the tests failed
Yeah I'd ignore the CI fails that aren't related to plugin-legacy. It's flaky lately.
If we directly sort the combined polyfills list, this ordering will be broken, and some projects may rely on this to work properly.
I did some digging and found https://github.com/babel/babel-polyfills/issues/192. So it seems like some order matters and presumably babel-preset-env will respect that when it returns. But in your PR, you're sorting the list too, which is incorrect?
In which case, the current behaviour is safer wrt ordering. Otherwise it's better to sort the entire array based based on core-js-compat/modules.json instead?
Ah, thanks for pointing it out. I'll see how to make the polyfill list stable and following the correct ordering requirement.
I did some digging and found babel/babel-polyfills#192. So it seems like some order matters and presumably babel-preset-env will respect that when it returns. But in your PR, you're sorting the list too, which is incorrect?
In which case, the current behaviour is safer wrt ordering. Otherwise it's better to sort the entire array based based on
core-js-compat/modules.jsoninstead?
Hi, I believe sorting according to core-js-compat/modules.json is definitely correct. However, by doing that, we are skipping the abstract layer provided by babel/preset-env, so I don't think it will be a clean solution.
My idea is to record all the generated polyfills and their ordering in each chunk and generate a combined list of polyfills by doing a topological sort, and generating a lexicographically minimum one to make the result stable.
However, by implementing it, as you can see from the CI result, I discovered that the order of polyfills given by babel is not strictly following the order in core-js-compat/modules.json. For example, in playground/legacy, the polyfills in the legacy chunk assets/chunk-main-legacy.!~{003}~.js is
core-js/modules/es.symbol.js
core-js/modules/es.symbol.description.js
core-js/modules/es.symbol.async-iterator.js
core-js/modules/es.symbol.iterator.js
core-js/modules/es.symbol.to-string-tag.js
core-js/modules/es.error.cause.js
core-js/modules/es.array.concat.js
core-js/modules/es.array.from.js
core-js/modules/es.array.iterator.js
core-js/modules/es.array.join.js
core-js/modules/es.array.map.js
core-js/modules/es.array.push.js
core-js/modules/es.function.name.js
core-js/modules/es.json.stringify.js
core-js/modules/es.json.to-string-tag.js
core-js/modules/es.map.js
core-js/modules/es.math.to-string-tag.js
core-js/modules/es.object.freeze.js
core-js/modules/es.object.get-prototype-of.js
core-js/modules/es.object.keys.js
core-js/modules/es.array.slice.js
core-js/modules/es.object.to-string.js
core-js/modules/es.promise.js
core-js/modules/es.set.js
core-js/modules/es.string.ends-with.js
core-js/modules/es.regexp.exec.js
core-js/modules/es.regexp.test.js
core-js/modules/es.regexp.to-string.js
core-js/modules/es.string.iterator.js
core-js/modules/es.string.raw.js
core-js/modules/esnext.iterator.constructor.js
core-js/modules/esnext.iterator.for-each.js
core-js/modules/esnext.iterator.map.js
core-js/modules/esnext.set.difference.v2.js
core-js/modules/esnext.set.intersection.v2.js
core-js/modules/esnext.set.is-disjoint-from.v2.js
core-js/modules/esnext.set.is-subset-of.v2.js
core-js/modules/esnext.set.is-superset-of.v2.js
core-js/modules/esnext.set.symmetric-difference.v2.js
core-js/modules/esnext.set.union.v2.js
core-js/modules/web.dom-collections.for-each.js
core-js/modules/web.dom-collections.iterator.js
core-js/modules/web.dom-exception.constructor.js
core-js/modules/web.dom-exception.stack.js
core-js/modules/web.dom-exception.to-string-tag.js
core-js/modules/web.structured-clone.js
core-js/modules/web.url.js
core-js/modules/web.url.to-json.js
core-js/modules/web.url-search-params.js
core-js/modules/web.url-search-params.delete.js
core-js/modules/web.url-search-params.has.js
core-js/modules/web.url-search-params.size.js
We can see core-js/modules/es.array.slice.js is ordered after core-js/modules/es.object.keys.js which is opposite to the ordering in core-js-compat/modules.json.
I wonder whether this is a bug in babel or intended.
This issue is addressed by adding a circle breaking mechanism.
Thanks for the great job. Really looking forward to this fix.
Thanks for investigating this @shankerwangmiao. Seems like babel-preset-env might have a bug with the ordering but I can't find out why.
From the current implementation, I don't understand why we need a complex sorting algorithm to sort it though. Why not do a single final sort based on core-js-compat/modules.json? The sort function should be something like this.
From the current implementation, I don't understand why we need a complex sorting algorithm to sort it though. Why not do a single final sort based on
core-js-compat/modules.json? The sort function should be something like this.
Because there is no insurance that someday babel won't add other polyfill dependency from other packages, I believe not directly relying on the core-js-compat/modules.json, but relying on solely the output of babel would be a better approach.
If babel adds another polyfill, wouldn't that still be fine? It shouldn't be having a specific order if it's not core-js, but if it needs a specific order, we can add it later too.
Perhaps what I don't understand too is that, if babel-preset-env outputs imports that aren't in the correct order, how does a sorting algorithm fix into a correct order, and especially would always respect core-js-compat/modules.json without reading it?
I can see that it's in the correct order now locally, but will it always be in the correct order if, let's say, there's only one chunk which babel outputs in an incorrect order?
If babel adds another polyfill, wouldn't that still be fine? It shouldn't be having a specific order if it's not core-js, but if it needs a specific order, we can add it later too.
Perhaps what I don't understand too is that, if
babel-preset-envoutputs imports that aren't in the correct order, how does a sorting algorithm fix into a correct order, and especially would always respectcore-js-compat/modules.jsonwithout reading it?I can see that it's in the correct order now locally, but will it always be in the correct order if, let's say, there's only one chunk which babel outputs in an incorrect order?
The basic assumption is that the ordering given by babel is (or should be) always right, no matter whether the polyfills are from core-js or other sources. We utilize this information rather than the information from core-js-compat/modules.json to keep all the polyfills in the correct order. The "correct order" means that if a polyfill A comes before B in a chunk, A will come before B in the final output; if a polyfill A and a polyfill B never appear in the same chunk, there will be no ordering constrain between them.
Even if babel is correctly outputting in the right order today, there's still the problem where additionalModernPolyfills or additionalLegacyPolyfills get merged and there are not in the right order. Having in the incorrect order is very unlikely to be intended, so we should also sort and fix it up.
I get that fixing the ordering issue wasn't the initial goal of this PR, it was to only make it deterministic. But knowing now that the bug exist, and the way to solve this is to sort based on core-js-compat/modules.json, we can do that directly to kill two birds with one stone? Otherwise we have to followup with another PR that does that.
You are right about this. My original point is to only keep the output deterministic, not to solve the ordering issue.
If we sort according to core-js-compat/modules.json, what if the user choose to include polyfills from other than those provided by core-js in additional*Polyfills? Also, currently, additional*Polyfills comes before the polyfills imported by babel, what if the user relies on the ordering?
What I had in mind is that, if there's any polyfills that aren't within core-js-compat/modules.json, then we keep the ordering as-is. However, it is more complex I think 🤔 For example:
// es.array.slice must be before es.object.keys
// Case 1:
const additional = ['core-js/modules/es.array.slice.js', 'something-else.js']
const babel = ['core-js/modules/es.object.keys.js']
const result = ['core-js/modules/es.array.slice.js', 'something-else.js', 'core-js/modules/es.object.keys.js']
// Case 2:
const additional = ['core-js/modules/es.object.keys.js', 'something-else.js']
const babel = ['core-js/modules/es.array.slice.js']
const result = ['core-js/modules/es.array.slice.js', 'core-js/modules/es.object.keys.js', 'something-else.js']
// Move es.array.slice before es.object.keys
I'm not sure if a .sort algorithm would be enough to do the trick, but I think keeping the basic idea of moving things to the front should make sure non-corejs modules are not simply moved entirely to last. In other words, non-corejs modules if it has another polyfill before it, there shouldn't be something injected between them. I might need to think about this more.
I guess a simple sort is not enough. Maybe we need topology sorting. However, I have no idea on how to deal with contradictory ordering constrains.
I guess a simple sort is not enough. Maybe we need topology sorting. However, I have no idea on how to deal with contradictory ordering constrains.
Maybe a sorting algorithm is enough. This will just sort the polyfill import statement
I had a few ideas recently to solve the ordering issue, but all are quite complex. @shankerwangmiao I'm happy to solve the deterministic issue for now, but I think we could simplify the implementation.
The forth argument of renderChunk has meta.chunks. This object is properly ordered. On the first renderChunk call, we can create a map like this:
chunkFileNameToPolyfills = new Map<string, { modern: Set<string>, legacy: Set<string> }>();
for (const fileName in meta.chunks) {
chunkFileNameToPolyfills.set(fileName, { modern: new Set(), legacy: new Set() }
}
And then later on the code in renderChunk will always only populate the sets in chunkFileNameToPolyfills. And finally in generateBundle, we can iterate chunkFileNameToPolyfills and combine it all back into the main modernPolyfills and legacyPolyfills. What do you think?
I followed your suggestion, but with a simpler solution. In renderChunk(), chunkName and polyfills are recorded in an array. In generateBundle(), we simply sort it according the chunkName, and combine the polyfills together.
The change made to the original process is simple: only the order of polyfill groups generated by chunks and pushed to {legacy,modern}Polyfills is changed.
I had a few ideas recently to solve the ordering issue, but all are quite complex. @shankerwangmiao I'm happy to solve the deterministic issue for now, but I think we could simplify the implementation.
The forth argument of
renderChunkhasmeta.chunks. This object is properly ordered. On the firstrenderChunkcall, we can create a map like this:chunkFileNameToPolyfills = new Map<string, { modern: Set<string>, legacy: Set<string> }>(); for (const fileName in meta.chunks) { chunkFileNameToPolyfills.set(fileName, { modern: new Set(), legacy: new Set() } }And then later on the code in
renderChunkwill always only populate the sets inchunkFileNameToPolyfills. And finally ingenerateBundle, we can iteratechunkFileNameToPolyfillsand combine it all back into the mainmodernPolyfillsandlegacyPolyfills. What do you think?
modernPolyfills and legacyPolyfills is just used for generate a file like
import "core-js/modules/es.regexp.exec.js";import "core-js/modules/es.regexp.test.js";import "core-js/modules/es.regexp.to-string.js";import "core-js/modules/es.string.match.js";import "core-js/modules/es.string.replace.js";
#16668
@Huodoo please read the above discussion, particularly https://github.com/vitejs/vite/pull/16566#issuecomment-2089513643. The import order matters.
ping @bluwy
I'm not sure if I missed something, but I pushed a refactor implementing https://github.com/vitejs/vite/pull/16566#issuecomment-2109604092 which I felt was simpler and has less memory-copying with the data structures. I also confirmed that the discovered modern and legacy polyfills were still the same order.
I'm not sure if I missed something, but I pushed a refactor implementing #16566 (comment) which I felt was simpler and has less memory-copying with the data structures. I also confirmed that the discovered modern and legacy polyfills were still the same order.
I also implemented that in 37c344f, before I pinged you. I think your implementation is better. I've tested the refactored version and the output remains the same.
I wonder if we are ready to have it merged.
@sapphi-red I think at the moment we had discovered that Babel doesn't necessarily return the polyfills in the right order, and there's definitely a chance the order is incorrect due to different chunks that can contain different code. But in my comment I figured that it's easier to tackle this deterministic issue first and figure out the ordering later as it can be tricky to sort with non corejs modules (sorry for the long discussion here).
Ah, I see. Then, I don't think this will break things so I think it's fine.