encoding_rs
encoding_rs copied to clipboard
Unsafe review notes
I'm performing an unsafe review of encoding_rs. The hope is by the end of this I will have safety documentation for all unsafe blocks in the crate.
Opening this issue to keep track of things I have found, especially things I do not plan to fix with documentation (but may fix later).
The branch with the documentation is https://github.com/manishearth/encoding_rs/tree/unsafe (compare)
Existing UB: https://github.com/hsivonen/encoding_rs/issues/79
One thing I think that can be done is that the constants in ascii.rs can be cleaned up greatly.
I believe the following works:
pub const ALU_STRIDE_SIZE: usize = mem::size_of::<usize> * 2;
pub const MAX_STRIDE_SIZE: usize = ALU_STRIDE_SIZE;
pub const ALU_ALIGNMENT: usize = mem::align_of::<usize>();
pub const ALU_ALIGNMENT_MASK: usize = ALU_ALIGNMENT - 1;
For SIMD, I think we can do
pub const SIMD_ALIGNMENT = align_of::<u16x8>();
and assert in a constant that it is equivalent to align_of::<u8x16>()
The store8_unaligned
in copy_bmp_to
could probably be aligned since we're writing to an aligned u16 slice
if SIMD_STRIDE_SIZE
is a multiple of 4 (it is).
Utf16Destination/Utf8Destination are extremely leaky abstractions, safety wise. Their safety depends heavily on how, precisely, they are used, but they're not unsafe
.
I'm not sure if they can be properly reviewed without adding unsafe
there
I think the way to handle Utf16/Utf8Destination is to change the handle types to be "checked for one write" and "checked for two writes" (etc), and have them call each other. I think I'll do this as a separate PR, but I am convinced these are safe now.
They're fiddly enough that I think a refactoring would help in the long run. The invariants are currently scattered across the file.