rust icon indicating copy to clipboard operation
rust copied to clipboard

perf: remove loop from `str::floor_char_boundary`

Open overlookmotel opened this issue 1 month ago • 6 comments

rust-lang/rust#144472 made str::floor_char_boundary a const function, but in doing so introduced a loop. This is unnecessary because the next UTF-8 character boundary can only be within a 4-byte range.

This produces excessive code for e.g. str.floor_char_boundary(20), because the loop is unrolled.

https://godbolt.org/z/5f3YsM6oK

This PR replaces the loop with 3 checks in series.

In addition to reducing code size in some cases, it also removes bounds checks from calling code when following floor_char_boundary with a call to String::truncate (which I assume might be a common pattern).

Notes

  • I tried using index.unchecked_sub(1), but found it doesn't remove bounds checks, whereas index.checked_sub(1).unwrap_unchecked() does. Surprising!

  • The assert_uncheckeds are required to elide checks from code following the floor_char_boundary call e.g.:

let index = string.floor_char_boundary(20);
string.truncate(index);

If this PR is accepted, I'll follow up with a similar PR for ceil_char_boundary.

Very happy to receive feedback and amend.

overlookmotel avatar Nov 30 '25 00:11 overlookmotel

r? @scottmcm

rustbot has assigned @scottmcm. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot avatar Nov 30 '25 00:11 rustbot

Rather than manually unrolling the code here, it might be worth considering if the original implementation can be made const instead:

https://github.com/rust-lang/rust/blob/03b17b181af4945fa24e0df79676e89454546440/library/core/src/str/mod.rs#L244-L247

I had this exact thought in the original implementation, which is why I explicitly only checked the four following bytes for a character boundary. Note that I haven't fully scrutinised the code and yours may be better regardless, but I figured I'd mention this for some extra context. I would expect LLVM to automatically unroll a 4-max-iteration loop in most contexts.

clarfonthey avatar Nov 30 '25 06:11 clarfonthey

You might want to add a codegen test under tests/codegen-llvm to make sure the panics/slice fail checks that you're targeting are optimized out.

At the time, I tried to add a counter variable to limit the number of iterations but I wasn't able to remove the panics like the current implementation does.

slice_new and truncate_new still have panic paths when the index is variable. Variable index slice doesn't have a panic in the current implementation. Variable truncate has a panic for both implementations. https://godbolt.org/z/T1TexYbE7

I wrote an example codegen test to help with testing this out (others may be able to improve it further). You can run the tests with ./x test [path to file or directory] (save time by iterating over the IR/assembly in godbolt first). Again, the variable truncate may not be possible to enable currently. https://rustc-dev-guide.rust-lang.org/tests/compiletest.html#codegen-tests

Example codegen test file

// Test that panics and slice errors are elided from `floor_char_boundary` implementation.

//@ compile-flags: -Copt-level=3

#![crate_type = "lib"]

// CHECK-LABEL: @floor_fixed_index
#[no_mangle]
pub fn floor_fixed_index(s: &str) -> usize {
    // CHECK-NOT: panic
    s.floor_char_boundary(20)
}

// CHECK-LABEL: @floor_variable_index
#[no_mangle]
pub fn floor_variable_index(s: &str, i: usize) -> usize {
    // CHECK-NOT: panic
    s.floor_char_boundary(i)
}

// CHECK-LABEL: @floor_slice_fixed_index
#[no_mangle]
pub fn floor_slice_fixed_index(s: &str) -> &str {
    // CHECK-NOT: panic
    // CHECK-NOT: slice_error_fail
    let index = s.floor_char_boundary(20);
    &s[..index]
}

// CHECK-LABEL: @floor_slice_variable_index
#[no_mangle]
pub fn floor_slice_variable_index(s: &str, i: usize) -> &str {
    // CHECK-NOT: panic
    // CHECK-NOT: slice_error_fail
    let index = s.floor_char_boundary(i);
    &s[..index]
}

// CHECK-LABEL: @truncate_from_floor_fixed_index
#[no_mangle]
pub fn truncate_from_floor_fixed_index(s: &mut String) {
    // CHECK-NOT: panic
    let index = s.floor_char_boundary_new(20);
    s.truncate(index)
}

// CHECK-LABEL: @truncate_from_floor_variable_index
#[no_mangle]
pub fn truncate_from_floor_variable_index(s: &mut String, i: usize) {
    // CHECK-NOT: panic
    let index = s.floor_char_boundary_new(i);
    s.truncate(index)
}

okaneco avatar Nov 30 '25 19:11 okaneco

slice_new and truncate_new still have panic paths when the index is variable. Variable index slice doesn't have a panic in the current implementation. Variable truncate has a panic for both implementations.

Yes, I noticed this today. Am hoping to eliminate all these panics. I feel like it should be possible, but... mysteries of the compiler...

overlookmotel avatar Nov 30 '25 19:11 overlookmotel

  • I tried using index.unchecked_sub(1), but found it doesn't remove bounds checks, whereas index.checked_sub(1).unwrap_unchecked() does. Surprising!

This is the long-frustrating problem that sub nuw i32 %1, 1 gets turned into add i32 %1, -1 by LLVM and thus completely loses the "it's not going to overflow" information. (cc @nikic in case this real-life example is helpful)

Whereas x.check_sub(1).unwrap_unchecked() ends up with an llvm.assume(x >= 1) which doesn't get removed.

scottmcm avatar Dec 06 '25 08:12 scottmcm

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot avatar Dec 06 '25 09:12 rustbot