safety-dance icon indicating copy to clipboard operation
safety-dance copied to clipboard

Audit url

Open evanjs opened this issue 4 years ago • 11 comments

url: GitHub, crates.io

cargo-geiger reports numerous usages of unsafe, though only 6 seem to be local to the url crate itself.

Example unsafe usage in decode_utf8_lossy

A majority of the remaining unsafe usages consist of, e.g. unicode-normalization and smallvec (#9) Absolutely no idea if any of these unsafe cases are appropriate or not.

The url crate seems to be widely used enough to check, at least.

evanjs avatar Nov 02 '19 21:11 evanjs

Just looked at the example unsafe block you linked. I think it is safe, and potentially appropriate, but I believe there is a way it can be re-written in safe code. I'll review the rest tomorrow and submit a PR if needed.

JoshMcguigan avatar Nov 03 '19 04:11 JoshMcguigan

Submitted the PR - https://github.com/servo/rust-url/pull/560

JoshMcguigan avatar Nov 03 '19 05:11 JoshMcguigan

The comments look very nice and clear to me. Thanks for doing this!

I think one more unsafe block could be trivially rewritten as safe code, I've left a comment on the PR.

This leaves only two iterator implementations that return Some(&str) and I think those can be converted to returning Some(char) instead to avoid the unsafe conversions, with something like this:

mystring.chars().flat_map(|c| if should_percent_encode(c) {percent_encode(c).chars()} else {c})

Shnatsel avatar Nov 03 '19 11:11 Shnatsel

@Shnatsel thanks for the feedback. I've removed the unsafe block as you suggested.

This leaves only two iterator implementations that return Some(&str) and I think those can be converted to returning Some(char) instead to avoid the unsafe conversions, with something like this..

If I'm understanding correctly, these would be changes which change the public interface?

JoshMcguigan avatar Nov 03 '19 13:11 JoshMcguigan

Those structs seem to be private, so I think that wouldn't be a breaking change.

Shnatsel avatar Nov 03 '19 13:11 Shnatsel

https://github.com/servo/rust-url/blob/master/src/form_urlencoded.rs#L125

This is one example of an iterator with an unsafe block. But ByteSerialize is marked as public in the form_urlencoded module, then src/lib.rs has pub mod form_urlencoded. I believe the other case of this is similar. Are you referring to some other structs? Or am I misinterpreting the Rust public/private rules?

JoshMcguigan avatar Nov 03 '19 14:11 JoshMcguigan

Nah, it's probably me who misinterpreted the rules. Those structs indeed seem to be public, so changing the return type would indeed be a breaking change. Apologies for the confusion.

Shnatsel avatar Nov 03 '19 14:11 Shnatsel

In addition, I figured we might want to check idna and percent_encoding as well, but wasn't sure how those should be handled.

i.e., I'm assuming we're doing a PR per project (e.g. rust-url/url, rust-url/idna), vs a PR per repo (e.g. rust-url/*)?

evanjs avatar Nov 03 '19 22:11 evanjs

PRs are easier per project. Tracking issues... dunno, per repo probably, because that's what I'd grep for unsafe

Shnatsel avatar Nov 03 '19 22:11 Shnatsel

For what its worth, my PR covers the full rust-url repo (but not dependencies).

JoshMcguigan avatar Nov 03 '19 23:11 JoshMcguigan

For what its worth, my PR covers the full rust-url repo (but not dependencies).

Yup. Just looked through it a minute ago and realized it was comprehensive.

Regardless, I'll keep this in mind moving forward, and include checklists for bigger projects in case we need to break things up.

Thanks @Shnatsel and @JoshMcguigan!

evanjs avatar Nov 03 '19 23:11 evanjs