zio
zio copied to clipboard
Add Additional Methods to Chunk
Right now we are a little hit or miss on methods on Chunk
. For example, we have drop
and dropWhile
but we don't have exists
or forall
. This can get a little annoying when you are working with Chunk
and run into one of these situations where you either have to drop back to using a Seq
or reimplement a method you are used to just having. This ticket is to go through the methods available on scala.collection.immutable.IndexedSeq
, make a recommendation regarding which of them make sense to implement for Chunk
, and then once we have agreement implement those methods.
Copying @Vilkina.
I can work on this. Also, we need a benchmark. So I need some help, @adamgfraser :)
Perhaps list all combinators missing from Chunk similar to how it looks in #2077 ? And reference this issue in future PRs @adamgfraser
@Vilkina are you perhaps still working on this?
i'm up for helping with this issue in case you don't find the time for it :slightly_smiling_face:
@pk044, you can help if you want. Definitely I don't have time. Thanks
@adamgfraser
what do you think about having these methods in Chunk
?
def find(f: A => Boolean): Option[A]
def head: Option[A]
def last: Option[A]
def indexWhere(f: A => Boolean): Option[Int]
@pk044 I think those look great. We may want to rename head
and last
to headOption
and lastOption
to more closely mirror usage in the existing Scala collections API. I realize we use head
and last
for ZStream
but that is an effect with a separate error channel versus this is supposed to be more of a drop in replacement for other data structures so may be more value in sticking closer to the existing API where possible. Similarly would be good to support an optional starting index for indexWhere
.
@adamgfraser do you think there's anything else I can add?
@pk044 I think there are a lot more methods we could add. I would take a look at the list of methods available for scala.collection.immutable.IndexedSeq
here. That is basically our competition. There are a ton of methods there and obviously some are more or less useful but at some level we need to compete with Array
and Vector
in terms of both performance and availability of relevant combinators.
The other thing that could be helpful is combinators that specifically make it easier to work with other ZIO data types. For example we have talked separately about having a ZIO.foreach
version for Chunk
, but since we control the Chunk
data type we can also just have it be an infix method on Chunk
so you can do chunk.foreach(f)
.
Finally, we could really use some benchmarks here, both to see where we are, inform efforts around specialization, and help us determine whether it makes sense to implement methods in terms of more general ones (e.g. there are a ton of methods we could potentially implement in terms of folds).
Chunk#foreach is Chunk#mapM :-) might make sense to rename or alias.
Ah, excellent point. I think it was the mapMPar
method I was missing at one point.
Are there any Methods still exists for implementing? (:
@Vilkina I added only 4 functions so you can go ahead and browse https://www.scala-lang.org/api/current/scala/collection/immutable/IndexedSeq.html and implement what's useful and missing from Chunk
:+1:
I want to suggest and implementgroupBy
, groupMap
,indexOf
, min
,minBy
, toSet
for now. After benchmark merged :)
I would like to try adding a few methods!
Here is some updated information on this issue, quoted from adamgfraser in the discord:
So Chunk was developed before Chunk extended IndexedSeq so some methods were manually implemented and then carried over once it did.
If you are working on that PR my suggestion would be to focus on one of two things. First, add benchmarks for all these operators on Chunk versus other collection types and see if we can get improved performance by implementing some methods ourselves instead of just inheriting from indexedSeq. Second, see if there are useful collection operators that aren't defined on normal Scala collections (maybe because they are effectual variants, they were added in 2.13 but not backported to 2.12, or otherwise seem useful). Particularly for the second one I would suggest commenting on the issue with the methods you want to add to get input from others and make sure no one else is working on the same ones.
For further consideration when working on this issue, see #3385 which discusses the possibility of changing Chunk
to no longer extend IndexedSeq
.
I will not be working on this issue, so it's free for someone else to try!
Chunk is now a Scala collection so it has all the Scala methods.
I think the scope of this ticket is now purely around specializing inherited methods (i.e. override and implement a custom collection method).
A method should be specialized only if we can do so in a way that increases performance. To know that, we need a benchmark. So really, this ticket should be about:
- Add a benchmark for some Chunk method.
- Compare that benchmark with Vector or Array performance.
- Decide if it makes sense to specialize the method.
Can I do this without https://github.com/zio/zio/issues/3353 ? Or shoud I join @subotic to do that one first? For this issue I should add some Vector benchmarks and compare those to the Chunk ones?
I picked up another one for now. So feel free to start this if you want. I may be back to this one later but I can also pair with someone.
Okay. I do think good benchmarks needs to come first so we are specializing methods where we are actually getting a performance benefit that actually justifies the additional code.