safety-dance icon indicating copy to clipboard operation
safety-dance copied to clipboard

Audit unicode-normalization

Open evanjs opened this issue 4 years ago • 2 comments

unicode-normalization: GitHub, crates.io Another widely-used crate.

Discovered some unsafe expressions when checking url (#51).

Once again, no clue if any of these are safe.

Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    :) = No `unsafe` usage found, declares #![forbid(unsafe_code)]
    ?  = No `unsafe` usage found, missing #![forbid(unsafe_code)]
    !  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        20/20        0/0    0/0     0/0      !  unicode-normalization 0.1.8
2/2        322/322      4/4    1/1     13/13    !  └── smallvec 0.6.12c

Example unsafe usage in decompose_hangul.

evanjs avatar Nov 03 '19 22:11 evanjs

Shnatsel's pull request to replace smallvec with tinyvec was recently accepted, so that dependency is no longer a problem.

The decompose_hangul unsafeness can be removed by simply replacing the from_u32_unchecked function call with from_u32, but that may have some unforeseen performance or behavioural consequences. It will pass the tests in the project however.

CuriouslyCurious avatar Mar 23 '20 17:03 CuriouslyCurious

I can't see a way to avoid unsafe character creation there without a performance hit, but at least we could add debug assertions there and then plug this into a fuzzer to see if they actually hold.

Shnatsel avatar Mar 23 '20 18:03 Shnatsel