operating on source text with TextEncoder is error-prone when dealing with special characters like the Byte-Order-Mark

Open mozfreddyb opened this issue 3 years ago • 0 comments

Hi,

As the spec co-author of SRI and maintainer of srihash.org, I was curious enough to look at your source code and noticed a pattern that was really hard to debug on our end, so here's my write-up, in the hope of making things a bit easier on your end.

Background: The website srihash.org provides an input field and computes the integrity metadata (shaXYZ-1234) for you. Oddly, some resources hashed really differently on the website than they did with curl | openssl ....

What we found is that we had been using the TextEncoder interface, which is swalloing the byte-order-mark character. E.g., When passing JavaScript source code like \xFEFFfoo into the TextEncoder API, you would receive the ArrayBuffer for foo.

We fixed the issue by not encoding things ourselves, but fetch()ing and awaiting the respone's .arrayBuffer(). Operating on that ArrayBuffer with crypto.subtle.digest() worked.

Further info is in our commit and our related issue https://github.com/mozilla/srihash.org/issues/524

Mar 11 '22 13:03 mozfreddyb