rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Trim methods on slices

Open SoniEx2 opened this issue 7 years ago • 14 comments
trafficstars

/// Trims this slice from the left.
fn trim_left_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[0]) {
        res = res[1..];
    }
    res
}

/// Trims this slice from the right.
fn trim_right_matches<F: Fn(T) -> bool>(&self, f: F) -> &[T] {
    let mut res = self;
    while res.len() > 0 && f(res[res.len()-1]) {
        res = res[..(res.len()-1)];
    }
    res
}

(and so on)

basically turns &["", "", "", "foo", ""] into &["foo", ""], &["", "foo", "", "", ""] into &["", "foo"], etc, depending on what you call.

SoniEx2 avatar Sep 22 '18 22:09 SoniEx2

These sounds like noise since you implement them trivially with code like s.split_at_mut(s.len() - s.iter().rev().filter(|x| x.len()==0).count()).0.

burdges avatar Sep 22 '18 23:09 burdges

That's even less readable. :/

Noise is having the same code snippet over and over again and not having it in a well-documented standalone function.

Also, I don't think that code of yours actually works. You probably meant to use take_while and split_at.

SoniEx2 avatar Sep 23 '18 02:09 SoniEx2

We have .skip_while() and .take_while() for iterators. Aren't those enough?

.iter().skip_while(|x| x == "").take_while(|x| x != "")

Lonami avatar Oct 09 '18 07:10 Lonami

No - that doesn't work like a trim method.

And they're not analogous to the string methods.

SoniEx2 avatar Oct 09 '18 10:10 SoniEx2

@SoniEx2 Can you give an example where this would be useful (except for Strings)?

Aloso avatar Nov 30 '18 06:11 Aloso

When you have a slice and don't want to allocate.

SoniEx2 avatar Nov 30 '18 12:11 SoniEx2

I ended up needing something like this for c-string parsing. I have a sequence of bytes and want to return the prefix containing the c-string data (not including the null terminator).

But then I realized, you can use split to do this:

fn trim_c_string(s: &[u8]) -> &[u8] {
    s.split(|&b| b == 0).next().unwrap_or(&[])
}

However, this implementation cannot eliminate the bounds check unlike the naive loop implementation:

pub fn fast_trim_c_string(s: &[u8]) -> &[u8] {
    for i in 0..s.len() {
        if s[i] == 0 {
            return s.split_at(i).0;
        }
    }
    s
}

josephlr avatar May 05 '20 07:05 josephlr

It's nice to have trim methods on str but in the project I am working on right now, I use &[char] slices instead of &str, because I need indexed access to characters and slicing of strings which &str does not support since it's UTF-8. It is disturbing that str has a .trim() method and a generic [T] slice does not. Would be really nice if this issue was resolved, all the more so it is that easy to implement.

A sample implementation looks like this though I am sure it is suboptimal.

fn trim<P>(&self, mut predicate: P) -> &[T]
where
    P: FnMut(&T) -> bool,
{
    let mut left = 0;
    let mut right = self.len();

    let mut iter = self.iter();

    while let Some(e) = iter.next() {
        if predicate(e) {
            left += 1
        } else {
            break;
        }
    }

    while let Some(e) = iter.next_back() {
        if predicate(e) {
            right -= 1
        } else {
            break;
        }
    }

    &self[left..right]
}

serid avatar May 14 '20 10:05 serid

We prefer split_* methods for slices, so as to retain access to underlying subslices, so I still think trim_* methods add noise. We could discuss some split_change(f) that does split_inclusive(|x| changed(f(x))) where

let mut previous = true;
let changed = |x| if previous == x { false } else { previous=x; true };

so trim is split_change(f).skip(1).next().unwrap_or(&[]). We're maybe better off adding roughly this changed state machine somewhere like core::iter though, not sure.

burdges avatar May 14 '20 11:05 burdges

perhaps a more useful trim would use Default::default() to remove things.

@serid you should really be using &[&str] instead of &[char] because &[char] is useless.

SoniEx2 avatar May 14 '20 14:05 SoniEx2

For information, there is a new-ish unstable API split_inclusive on slices that would help implementing such a thing. (The normal split API doesn't include the "split marker" in either of resulting sub slices. More info here: https://github.com/rust-lang/rust/pull/67330) However, I neglected making a tracking issue, so there isn't a direct path toward stabilization at the moment. I'll try to scrape some time to create a tracking issue the next weekend!

golddranks avatar May 14 '20 15:05 golddranks

https://github.com/rust-lang/rust/issues/72360 Tracking issue created.

golddranks avatar May 19 '20 19:05 golddranks

I have a use case- I'm currently working on updates to BufWriter, and specifically its implementation of write_vectored, as a part of https://github.com/rust-lang/rust/issues/78551. write_vectored takes an &[IoSlice], and it'd be very useful to be able to trim empty slices from both ends. This would allow me to forward the trimmed list of slices to the inner write_vectored method, and also to specifically specialize the case where we received exactly 1 non-empty slice. These cases aren't served by iterator methods, because I need to transform slices into smaller slices to process & forward as necessary.

Lucretiel avatar Oct 31 '20 05:10 Lucretiel

Perhaps a different use-case for str::trim_*-alike methods for &[u8]... in my case, I have files (out of my control) that are mostly-ASCII-serialized structs from some unknown language/library which I'm parsing with nom, and I'm trimming leading whitespace from the input at every step as an easy way to ignore insignificant whitespace.

However, these files are only mostly ASCII - in some "fields", they contain straight binary, so I can't treat the entire file I'm reading as valid UTF-8 (this being the reason for using &[u8] over &str).

That being said, I did find https://github.com/rust-lang/rust/issues/94035 - which is for the same as this, just restricted to ASCII specifically. In my case, that would be good enough. These methods are currently available in nightly: https://doc.rust-lang.org/std/primitive.slice.html#method.trim_ascii

Kage-Yami avatar Jun 19 '22 07:06 Kage-Yami