gbwt-rs icon indicating copy to clipboard operation
gbwt-rs copied to clipboard

Is Record::from_raw_parts() actually unsafe?

Open Manishearth opened this issue 3 months ago • 4 comments

https://github.com/jltsiren/gbwt-rs/blob/cd78e7f8f529866ee1f260db6261817f840ef5cb/src/bwt.rs#L360C19-L360C33

While unsafe reviewing gbz-base I noticed that Record::from_raw_parts() has nontrivial safety invariants.

However it's not clear to me if Record actually uses those invariants for safety-critical things, or if those invariants exist to avoid logic errors.

Can you clarify what the worst case scenario of incorrect usage of from_raw_parts() is? Is it UB?

If it's not UB, could you document that on from_raw_parts()? Such unsafe functions are typically an antipattern but it's still an understandable design decision. But it helps reviewers if it's actually documented as such.

Manishearth avatar Mar 05 '24 08:03 Manishearth

The short answer is that I don't know.

We are dealing with large data structures with complex invariants. There are relevant situations where full validation of the invariants would take orders of magnitude longer than the data is needed. Full validation is almost never done, and the consequences of invalid data can be nontrivial. There could be situations where converting invalid data structures to other simple-sds structures may have safety consequences. I have identified a few potential places, but I'm not sure when I'll have the time to deal with them.

jltsiren avatar Mar 05 '24 10:03 jltsiren

Hm, surprised to hear you say that: there's very little unsafe code in this crate so I was imagining the answer is no.

Manishearth avatar Mar 05 '24 11:03 Manishearth

The fundamental issue is that some supposedly safe functions in simple-sds are not actually safe. Some assume that the input data structure is in a valid state, and use the invariants to avoid multiple redundant sanity checks in the inner loop. But if the input is invalid, the output may also be invalid, and somewhere down the line we may make an out-of-bounds access to something. I've been trying to reduce this behavior when there is no significant performance penalty, but it has not been a priority.

jltsiren avatar Mar 06 '24 06:03 jltsiren

@jltsiren ah, an out of bounds access using unchecked indexing? That would do it.

Worth filing an issue upstream, I didn't follow dependencies for safety invariants

Manishearth avatar Mar 06 '24 07:03 Manishearth