perf: remove loop from `str::floor_char_boundary`
rust-lang/rust#144472 made str::floor_char_boundary a const function, but in doing so introduced a loop. This is unnecessary because the next UTF-8 character boundary can only be within a 4-byte range.
This produces excessive code for e.g. str.floor_char_boundary(20), because the loop is unrolled.
https://godbolt.org/z/5f3YsM6oK
This PR replaces the loop with 3 checks in series.
In addition to reducing code size in some cases, it also removes bounds checks from calling code when following floor_char_boundary with a call to String::truncate (which I assume might be a common pattern).
Notes
-
I tried using
index.unchecked_sub(1), but found it doesn't remove bounds checks, whereasindex.checked_sub(1).unwrap_unchecked()does. Surprising! -
The
assert_uncheckeds are required to elide checks from code following thefloor_char_boundarycall e.g.:
let index = string.floor_char_boundary(20);
string.truncate(index);
If this PR is accepted, I'll follow up with a similar PR for ceil_char_boundary.
Very happy to receive feedback and amend.
r? @scottmcm
rustbot has assigned @scottmcm. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
Rather than manually unrolling the code here, it might be worth considering if the original implementation can be made const instead:
https://github.com/rust-lang/rust/blob/03b17b181af4945fa24e0df79676e89454546440/library/core/src/str/mod.rs#L244-L247
I had this exact thought in the original implementation, which is why I explicitly only checked the four following bytes for a character boundary. Note that I haven't fully scrutinised the code and yours may be better regardless, but I figured I'd mention this for some extra context. I would expect LLVM to automatically unroll a 4-max-iteration loop in most contexts.
You might want to add a codegen test under tests/codegen-llvm to make sure the panics/slice fail checks that you're targeting are optimized out.
At the time, I tried to add a counter variable to limit the number of iterations but I wasn't able to remove the panics like the current implementation does.
slice_new and truncate_new still have panic paths when the index is variable. Variable index slice doesn't have a panic in the current implementation. Variable truncate has a panic for both implementations.
https://godbolt.org/z/T1TexYbE7
I wrote an example codegen test to help with testing this out (others may be able to improve it further).
You can run the tests with ./x test [path to file or directory] (save time by iterating over the IR/assembly in godbolt first).
Again, the variable truncate may not be possible to enable currently.
https://rustc-dev-guide.rust-lang.org/tests/compiletest.html#codegen-tests
Example codegen test file
// Test that panics and slice errors are elided from `floor_char_boundary` implementation.
//@ compile-flags: -Copt-level=3
#![crate_type = "lib"]
// CHECK-LABEL: @floor_fixed_index
#[no_mangle]
pub fn floor_fixed_index(s: &str) -> usize {
// CHECK-NOT: panic
s.floor_char_boundary(20)
}
// CHECK-LABEL: @floor_variable_index
#[no_mangle]
pub fn floor_variable_index(s: &str, i: usize) -> usize {
// CHECK-NOT: panic
s.floor_char_boundary(i)
}
// CHECK-LABEL: @floor_slice_fixed_index
#[no_mangle]
pub fn floor_slice_fixed_index(s: &str) -> &str {
// CHECK-NOT: panic
// CHECK-NOT: slice_error_fail
let index = s.floor_char_boundary(20);
&s[..index]
}
// CHECK-LABEL: @floor_slice_variable_index
#[no_mangle]
pub fn floor_slice_variable_index(s: &str, i: usize) -> &str {
// CHECK-NOT: panic
// CHECK-NOT: slice_error_fail
let index = s.floor_char_boundary(i);
&s[..index]
}
// CHECK-LABEL: @truncate_from_floor_fixed_index
#[no_mangle]
pub fn truncate_from_floor_fixed_index(s: &mut String) {
// CHECK-NOT: panic
let index = s.floor_char_boundary_new(20);
s.truncate(index)
}
// CHECK-LABEL: @truncate_from_floor_variable_index
#[no_mangle]
pub fn truncate_from_floor_variable_index(s: &mut String, i: usize) {
// CHECK-NOT: panic
let index = s.floor_char_boundary_new(i);
s.truncate(index)
}
slice_newandtruncate_newstill have panic paths when the index is variable. Variable indexslicedoesn't have a panic in the current implementation. Variabletruncatehas a panic for both implementations.
Yes, I noticed this today. Am hoping to eliminate all these panics. I feel like it should be possible, but... mysteries of the compiler...
- I tried using
index.unchecked_sub(1), but found it doesn't remove bounds checks, whereasindex.checked_sub(1).unwrap_unchecked()does. Surprising!
This is the long-frustrating problem that sub nuw i32 %1, 1 gets turned into add i32 %1, -1 by LLVM and thus completely loses the "it's not going to overflow" information. (cc @nikic in case this real-life example is helpful)
Whereas x.check_sub(1).unwrap_unchecked() ends up with an llvm.assume(x >= 1) which doesn't get removed.
Reminder, once the PR becomes ready for a review, use @rustbot ready.