cbor-js
cbor-js copied to clipboard
Fix handling of codepoints >= 0xe000
Codepoints in [U+E000,U+FFFF] were incorrectly handled by the surrogate pair code path.
Example of behavior before the fix:
duk> CBOR.encode('\ue000')
= |64f0908080|
duk> CBOR.decode(CBOR.encode('\ue000'))
= "\ud800\udc00"
duk> CBOR.decode(CBOR.encode('\uffff'))
= "\udbff\udc00"
After the fix:
duk> CBOR.encode('\ue000')
= |63ee8080|
duk> CBOR.decode(CBOR.encode('\ue000'))
= "\ue000"
duk> CBOR.decode(CBOR.encode('\uffff'))
= "\uffff"
Cbor.me gives the following for encoding "\ue000"
, matching the fixed output:
63 # text(3)
EE8080 # "\xEE\x80\x80"
Coverage remained the same at 99.262% when pulling 0f1ed4ad1fb48bdf6cba529b283bcb74a6c8cb9e on svaarala:fix-cp-above-dfff into 65dc49611107db83aff8308a6b381f4d7933824b on paroga:master.
This bug is the cause of failure to properly roundtrip emoji with modifiers such as ☝️ and 👆🏿, which are both U+261D with modifier️s U+FE0F or U+1F3FF respectively.
When I apply this change the en/decode roundtrip is successful. Without the change it fails to encode the modifier.
Example code:
let value = '_☝️_👆🏿_'; console.log(value, cbor.decode(cbor.encode(value)));
Without fix:
With fix:
@paroga It looks like this lib needs some love. Would you be up for handing the reins to someone else either temporarily or permanently? I'd be happy to review the existing PRs and Issues, either in this repo or (if you'd prefer) in a repo under my work org.
I'd also be happy to help in whatever way I can. I'm working with Duktape's native CBOR binding so I could help at least with bug fixing and maybe improve test coverage (Duktape's CBOR binding is based on this library, so many of the testcases in Duktape repo can be used as is).