hsd Add unicode lint - allow only ASCII symbols

Motivation

Motivation for this PR is recent attack using unicode:

https://www.trojansource.codes/
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42574

Approach to address the issue varies, recent example:

https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html - where they go with deny-by-default for the specific unicode symbols.

Another unicode based issue: https://certitude.consulting/blog/en/invisible-backdoor/

We can be even more strict with our lints and this is example of only allowing ASCII symbols and have separate allowed lists that appear out of necessity. This is in case something similar is discovered with different set of codes.

Details

Simplest form of the allowed list check:

grep $'[^\u0020-\u007e\t]' ./**/*.js | grep -v '^\./lib/hd/words/\(french\|italian\|japanese\|spanish\|chinese-traditional\|chinese-simplified\)\.js'

Simple form of the disallowed list check

grep $'[\u202A-\u202E\u2066-\u2069]' ./**/*.js

Future considerations

There are still files that I had to ignore for this:

Mnemonic words - any future PRs that affect those files (there should not be any though) or adds new language support needs to go through different checks, maybe we can have disallowed_list checks for those.

Even though script/unicode-lint.sh is not ignored, if changes are made to this file it needs more careful review what is added to the allowed list or use older unicode-lint to lint new one before executing it.

Other projects

This currently does not include other projects, only thing it checks is node_modules as of now. But we could potentially have this linter in separate repo and add to all other projects as well.

Note

This is little bit hacky version that is pretty slow and we can probably do better! Any contributions or even different implementation all together are welcome.

Nov 11 '21 08:11 nodech

Pull Request Test Coverage Report for Build 1686961484

0 of 0 changed or added relevant lines in 0 files are covered.
1205 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.2%) to 62.736%

Files with Coverage Reduction	New Missed Lines	%
lib/covenants/namestate.js	13	89.5%
lib/dns/server.js	38	86.53%
lib/wallet/rpc.js	138	28.0%
lib/wallet/wallet.js	185	74.48%
lib/blockchain/chain.js	253	68.09%
lib/node/rpc.js	283	30.76%
lib/mempool/mempool.js	295	63.15%
<!--	Total:	1205

Totals
Change from base Build 1521306511:	0.2%
Covered Lines:	21334
Relevant Lines:	31814

💛 - Coveralls

Nov 11 '21 10:11 coveralls

fun fact: BSD grep on OSX doesn't work with this script, gotta install GNU grep with homebrew...

Oh, I will revert this back to find + grep then, or will check what needs to change. This is just much quicker.

Dec 03 '21 19:12 nodech

On both Ubuntu 20 and OSX the script is finding "not allowed symbols" in a ton of files... I'm trying to figure out why

Jan 05 '22 16:01 pinheadmz

On both Ubuntu 20 and OSX the script is finding "not allowed symbols" in a ton of files... I'm trying to figure out why

Still getting these errors on both machines... I'm not sure how to proceed with this. Could be differences in shell versions or something? I'm starting to wonder if we should write a short program in C to scan all text files for unicode bytes...

Mar 01 '22 16:03 pinheadmz

OK I've got a great little C program for this:

https://github.com/pinheadmz/unicode-comb

It scans everything, including node modules! I've tested it by pasting one of the bidirectional characters into random files.

Mar 01 '22 19:03 pinheadmz

hsd hsd copied to clipboard

Add unicode lint - allow only ASCII symbols

Motivation

Details

Future considerations

Other projects

Note

Pull Request Test Coverage Report for Build 1686961484

💛 - Coveralls

hsd
hsd copied to clipboard