brave-mouse
brave-mouse copied to clipboard
Add charset support
I made this Node module to detect file character sets: https://www.npmjs.com/package/detect-charset
It's imperfect in that all files without a byte order mark are assumed to be latin1/utf-8, but I think the only way that could be improved upon in general is by returning an unknown for all utf-8 and other non-BOM files that contain unicode.
Let me know if the module requires any improvements/changes to work with brave-mouse. Pull requests welcome.
I have experimented a bit with ICU’s charset detector (using node-icu-charset-detector) which seems to be the most accurate and most battle-tested charset detector out there. However, I’d want to avoid users having to brew install icu4c
. I’m currently trying to compile ICU using node-gyp which would probably be my preferred solution.
I have marked the test cases which fail using detect-charset but should work fine using ICU’s detector, if you like to take a look at them.
In case anyone’s interested in this, I have released detect-character-encoding which compiles ICU using node-gyp.