itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Itertools::join

Open pickfire opened this issue 3 years ago • 8 comments

Our current join only outputs String, maybe we should consider using the nightly Join in standard library? I created a pull request using that in standard library but I noticed requires allocation so it is less likely to be included in standard std library.

https://github.com/rust-lang/rust/pull/75738

Maybe it can be considered here after it is stable on standard library?

pickfire avatar Feb 07 '21 15:02 pickfire

How about something like:

trait JoinableIterator<S>: Iterator {
    fn join<O>(self, sep: S) -> O
    where 
        Self: Sized,
        O: Join<Self::Item, S>
    {
        O::join(self, sep)
    }
}
impl<I: Iterator, S> JoinableIterator<S> for I {}

trait Join<T, S> {
    fn join<I: IntoIterator<Item = T>>(iter: I, sep: S) -> Self;
}

impl<T: std::fmt::Display, S: std::fmt::Display> Join<T, S> for String {
    fn join<I: IntoIterator<Item = T>>(iter: I, sep: S) -> Self {
        let mut iter = iter.into_iter();
        match iter.next() {
            Some(item) => {
                use std::fmt::Write;
                let mut s = item.to_string();
                iter.for_each(|item| {
                    let _ = write!(s, "{}", sep);
                    let _ = write!(s, "{}", item);
                });
                s
            }
            None => String::new(),
        }
    }
}
// Possibly other implementations

The S generic parameter on JoinableIterator is for better turbofish ergonomics.

SkiFire13 avatar Feb 09 '21 15:02 SkiFire13

I thought about something like that but I haven't been able to figure it out. But maybe not Display, potential footgun if users accidentally put something that is display but they don't want that behavior, it should be all &str or String. But yeah, that would cause a breaking change in itertools, but IMHO I wouldn't want the Display, it is doing write! (which possibly is hard to optimize) as well as there is nothing that guards users from doing side effect when impl ToString which impl Display uses, so limiting it to the original type is better I think.

pickfire avatar Feb 09 '21 17:02 pickfire

I based myself on the current implementation which requires Self::Item: Display, so I kept that bound. I think the original idea was that a join with Strings/&strs only would require you to allocate a new String for each iterator element and that would be slower, but I guess we can change that if it's considered a bad API.

As for the .to_string(), it uses Display under the hood and there's a blanket implementation for T: Display so I thought there can't be an user provided implementation that does side effects, however I forgot about specialization.

SkiFire13 avatar Feb 09 '21 19:02 SkiFire13

I based myself on the current implementation which requires Self::Item: Display, so I kept that bound. I think the original idea was that a join with Strings/&strs only would require you to allocate a new String for each iterator element and that would be slower, but I guess we can change that if it's considered a bad API.

User could still use &str if they want it to be more efficient so they don't want to allocate, we probably also does not want to force String, maybe Borrow<str> or AsRef<str>?

As for the .to_string(), it uses Display under the hood and there's a blanket implementation for T: Display so I thought there can't be an user provided implementation that does side effects, however I forgot about specialization.

No, user can override that. Still, that is the reason why I think it is weird and didn't end up using this join from itertools.

Also, if we use Display, what would be for join for &[u8]?

pickfire avatar Feb 10 '21 02:02 pickfire

User could still use &str if they want it to be more efficient so they don't want to allocate, we probably also does not want to force String, maybe Borrow or AsRef?

You still need to get that &str. What if I want to join an iterator of numbers into a string? e.g. I want their human representation printed.

No, user can override that. Still, that is the reason why I think it is weird and didn't end up using this join from itertools.

On stable? I don't think so. The following code doesn't compile if you don't enable specialization (however with specialization it does)

struct Foo;

impl std::fmt::Display for Foo {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        todo!()
    }
}

impl std::string::ToString for Foo {
    fn to_string(&self) -> String {
        todo!()
    }
}

Also, if we use Display, what would be for join for &[u8]?

Display is a bound only on the implementation that produces String, it doesn't prevent you from writing other implementations.

SkiFire13 avatar Feb 10 '21 08:02 SkiFire13

You still need to get that &str. What if I want to join an iterator of numbers into a string? e.g. I want their human representation printed.

Yes, they have to convert between different types.

On the other hand, what if there is a secret type than the users does not want to implement Display but only ToString? Does that mean this is not usable? Or they have to do some conversion?

I do think Display is more efficient (since it could do less allocation) but I don't think Display is good, what if it's Vec or any other types that users would not want Display? It would be preferable if they do the conversion themselves or it may be surprising that suddenly a random string appear within the join, at least it is less magic, user (or library authors) can put magic in Display but nothing can go wrong if it's already String or &str, then it may be surprising to see that part being included in join.

Or maybe we should ask in standard library why didn't they do slice::join with Display?

Say, if users have Vec<User> do we want them to implement Display or would we want them to convert it themselves to String? If we allow Display they may have a tendency to implement that rather that doing the conversion themselves, and later they might find out that they need a different Display, once they changed the Display and it may break the join, so it is better to not merge Display for this purpose.

pickfire avatar Feb 10 '21 15:02 pickfire

Say, if users have Vec<User> do we want them to implement Display or would we want them to convert it themselves to String? If we allow Display they may have a tendency to implement that rather that doing the conversion themselves, and later they might find out that they need a different Display, once they changed the Display and it may break the join, so it is better to not merge Display for this purpose.

This is a pretty good point. Maybe alternatively we could have a join_with method with its own backing trait that allows the user to specify how the item should be appended using a closure. This way we can make it more general for any type, not just Strings.

SkiFire13 avatar Feb 10 '21 20:02 SkiFire13

This is a pretty good point. Maybe alternatively we could have a join_with method with its own backing trait that allows the user to specify how the item should be appended using a closure. This way we can make it more general for any type, not just Strings.

Good idea, or maybe we can have a function but I wonder if we could let users specify their own trait bounds? Do we give the users power to do write! or we only handle that and let the users provide AsRef<&str>?

By the way, happy chinese new year!

pickfire avatar Feb 13 '21 15:02 pickfire