esbuild icon indicating copy to clipboard operation
esbuild copied to clipboard

RegExp transformation breaks tree-shaking

Open iainmerrick opened this issue 10 months ago • 3 comments

I'm building an es6 bundle that imports valibot. I found that the bundled output contains some unreferenced RegExp constants. Here's a link that shows the problem:

https://esbuild.github.io/try/#dAAwLjI1LjIAeyB0YXJnZXQ6ICJlczYiLCB0cmVlU2hha2luZzogdHJ1ZSB9AC8vIFJlZ2V4ZXMgY29waWVkIGZyb20gdmFsaWJvdCAtLSBodHRwczovL2dpdGh1Yi5jb20vZmFiaWFuLWhpbGxlci92YWxpYm90L2Jsb2IvNjc2OTJkZGMyYTJhYmFhMTE2ZmZiY2M3YmY5MDk5MGJiM2ZiMzcwMi9saWJyYXJ5L3NyYy9yZWdleC50cyNMNDAKCmNvbnN0IEVNQUlMX1JFR0VYID0gL15bXHcrLV0rKD86XC5bXHcrLV0rKSpAW1xkYS16XSsoPzpbLi1dW1xkYS0gIHpdKykqXC5bYS16XXsyLH0kL2l1OwoKdmFyIEVNT0pJX1JFR0VYID0gL14oPzpbXHV7MUYxRTZ9LVx1ezFGMUZGfV17Mn18XHV7MUYzRjR9W1x1e0UwMDYxfS1cdXtFMDA3QX1dezJ9W1x1e0UwMDMwfS1cdXtFMDAzOX1cdXtFMDA2MX0tXHV7RTAwN0F9XXsxLDN9XHV7RTAwN0Z9fCg/Olxwe0Vtb2ppfVx1RkUwRlx1MjBFMz98XHB7RW1vamlfTW9kaWZpZXJfQmFzZX1ccHtFbW9qaV9Nb2RpZmllcn0/fFxwe0Vtb2ppX1ByZXNlbnRhdGlvbn0pKD86XHUyMDBEKD86XHB7RW1vaml9XHVGRTBGXHUyMEUzP3xccHtFbW9qaV9Nb2RpZmllcl9CYXNlfVxwe0Vtb2ppX01vZGlmaWVyfT98XHB7RW1vamlfUHJlc2VudGF0aW9ufSkpKikrJC91OwoKCmNvbnNvbGUubG9nKCJIZWxsbyIpOwo

Note that EMAIL_REGEX is correctly stripped from the output, but EMOJI_REGEX remains.

I think what's happening is that the regex constant is being transformed to a new RegExp() expression as noted on https://esbuild.github.io/content-types/#javascript:

Unsupported regular expression literals are transformed into a new RegExp() constructor call so you can bring your own polyfill library to get them to work anyway.

The reference to global RegExp is counted as a possible side-effect, therefore it isn't stripped.

Possibly this is just a "working as intended", but... In this case, there definitely isn't a side-effect, and that line would have been stripped if it wasn't transformed (eliminating the need to transform it in the first place).

It would be good if esbuild could dead-strip that line when targeting es6. Maybe this could be done by dead-stripping before code transforms take place (though I imagine that would be a major architectural change), or maybe just by (configurable?) special handling for RegExp.

iainmerrick avatar Apr 09 '25 10:04 iainmerrick

there definitely isn't a side-effect

By using --target=es6, you are telling esbuild that this regular expression may be run in an ES6 environment that doesn't support the \p syntax. In that case, the constructor will throw an error, which is something that esbuild considers to be a side effect. That's why this is currently happening.

To work around this, you can tell esbuild that you're running your code in an ES6 environment that supports the \p syntax. Use esbuild's supported API with supported: { 'regexp-unicode-property-escapes': true } like this.

As for fixing this in esbuild itself: I understand that this is in code that isn't used, and that including it here isn't desirable. I'll have to think about the implications of potentially ignoring this side effect to get this to work. It's a bit of a weird case because esbuild is transforming something that would be a parse-time syntax error into a run-time error (so that you could potentially polyfill RegExp to make this work in ES6), so it's arguable whether or not this should be considered a run-time side effect.

evanw avatar Apr 09 '25 13:04 evanw

You're right, the actual problem we were hitting was the new RegExp throwing an exception. I was surprised because I was pretty sure it was in code we weren't using and therefore should have been dead-stripped, and boiled it down to the case above.

So esbuild is correct that there was an important side-effect! What a twisty little problem.

Thanks for the workaround. I was able to avoid the problem in our app by adjusting our imports.

iainmerrick avatar Apr 09 '25 13:04 iainmerrick

Thinking about this some more, the real problem is that a runtime crash can sneak into your build.

If there were an option to flag it as a bundle-time error instead, I think that would solve the problem. The user could then decide whether to fix their imports, polyfill RegExp, or something else.

Could there be an option similar to supported: { 'regexp-unicode-property-escapes': true } that causes the build to fail? In this case, I'd want to ban any cases where regexp literals are changed to new RegExp.

iainmerrick avatar May 07 '25 10:05 iainmerrick