esbuild icon indicating copy to clipboard operation
esbuild copied to clipboard

Text loader doesn't remove byte order mark (BOM)

Open floyd-may opened this issue 1 year ago • 2 comments

When loading files using the text loader, the loader doesn't strip byte order marks from the beginning of the file. For HTML files, for instance, this can turn into awkward problems like having an HTML entity like  inserted into the DOM inadvertently. Example here:

https://esbuild.github.io/try/#YgAwLjI0LjAALS1idW5kbGUKLS1mb3JtYXQ9ZXNtCi0tb3V0ZmlsZT1vdXQuanMKLS1zb3VyY2VtYXAKLS1kcm9wLWxhYmVsczpERUJVRwotLW1pbmlmeS1pZGVudGlmaWVycwotLWxvYWRlcjouaHRtbD10ZXh0AGUAZW50cnkudHMAaW1wb3J0IGZpbGVUZXh0IGZyb20gIi4vZXhhbXBsZS5odG1sIjsKCmNvbnNvbGUubG9nKGZpbGVUZXh0KTsAAGV4YW1wbGUuaHRtbAD+u788ZGl2PmhlbGxvIHdvcmxkPC9kaXY+

Bear in mind the example shows the text content of the HTML file as: image

Whereas loading an HTML file with a BOM at the beginning in any reasonable text editor won't show that leading BOM.

I can work around it by ensuring that no text loader-loaded files have BOMs, but it does seem reasonable for the text loader to strip a leading BOM.

floyd-may avatar Oct 07 '24 16:10 floyd-may

And I'm also glad to make an attempt at a PR if the maintainer(s) agree that BOMs should be stripped by the text loader.

floyd-may avatar Oct 08 '24 00:10 floyd-may

I fixed it by simply converting the html file from UTF-8 BOM to UTF-8 (without BOM)

SinnerAir avatar Oct 08 '24 13:10 SinnerAir

I think this change makes sense (and is trivial, so no need for a PR) but I could see it breaking things with code that relies on this (or that works around it), so I'm going to wait until a breaking change release to do this.

evanw avatar Dec 20 '24 01:12 evanw

Could possibly go with opt-in behavior, and then change the default behavior (or remove the configurability altogether) on the next breaking change release?

floyd-may avatar Dec 20 '24 01:12 floyd-may