cats icon indicating copy to clipboard operation
cats copied to clipboard

Optimize traverse

Open TimWSpence opened this issue 2 years ago • 30 comments

This addresses the StackSafeMonad part of https://github.com/typelevel/cats/issues/4408. I thought that there were enough changes in this to merit a separate PR for the Defer part.

Open questions (will edit these out of the description later):

  • The TraverseFilter laws hard-code Option. How do we test that the new branching implementation is lawful?

The following benchmarks were run on an AWS t3.xlarge:

Baseline (b08196e8526f6025d009ac0c8efc0a0e0ebcb1c4)

Benchmark (length) Mode Cnt Score Error Units
TraverseBench.filterList 10000 thrpt 20 3247.486 ± 270.817 ops/s
TraverseBench.filterVector 10000 thrpt 20 3351.585 ± 255.883 ops/s
TraverseBench.mapList 10000 thrpt 20 3198.905 ± 235.865 ops/s
TraverseBench.mapVector 10000 thrpt 20 3906.339 ± 225.508 ops/s
TraverseBench.traverseChain 10000 thrpt 20 556.706 ± 31.814 ops/s
TraverseBench.traverseChainError 10000 thrpt 20 1278.954 ± 74.242 ops/s
TraverseBench.traverseFilterChain 10000 thrpt 20 536.841 ± 38.696 ops/s
TraverseBench.traverseFilterList 10000 thrpt 20 532.103 ± 30.473 ops/s
TraverseBench.traverseFilterVector 10000 thrpt 20 590.312 ± 29.534 ops/s
TraverseBench.traverseList 10000 thrpt 20 537.023 ± 33.287 ops/s
TraverseBench.traverseListError 10000 thrpt 20 1674.930 ± 48.771 ops/s
TraverseBench.traverseVector 10000 thrpt 20 581.844 ± 32.074 ops/s
TraverseBench.traverseVectorError 10000 thrpt 20 1778.334 ± 135.859 ops/s
TraverseBench.traverse_Chain 10000 thrpt 20 490.822 ± 14.145 ops/s
TraverseBench.traverse_List 10000 thrpt 20 474.568 ± 28.995 ops/s
TraverseBench.traverse_Vector 10000 thrpt 20 622.924 ± 38.002 ops/s

Chain (2d5f4d7917804e921488f5f2e263e8349a6ceb42)

Benchmark (length) Mode Cnt Score Error Units
TraverseBench.filterList 10000 thrpt 20 3170.974 ± 146.628 ops/s
TraverseBench.filterVector 10000 thrpt 20 3217.881 ± 181.087 ops/s
TraverseBench.mapList 10000 thrpt 20 3443.969 ± 104.117 ops/s
TraverseBench.mapVector 10000 thrpt 20 3827.401 ± 148.074 ops/s
TraverseBench.traverseChain 10000 thrpt 20 784.845 ± 25.990 ops/s
TraverseBench.traverseChainError 10000 thrpt 20 1302.782 ± 62.342 ops/s
TraverseBench.traverseFilterChain 10000 thrpt 20 748.986 ± 62.151 ops/s
TraverseBench.traverseFilterList 10000 thrpt 20 669.817 ± 41.798 ops/s
TraverseBench.traverseFilterVector 10000 thrpt 20 618.644 ± 28.558 ops/s
TraverseBench.traverseList 10000 thrpt 20 583.413 ± 27.079 ops/s
TraverseBench.traverseListError 10000 thrpt 20 1253.510 ± 76.328 ops/s
TraverseBench.traverseVector 10000 thrpt 20 581.975 ± 27.619 ops/s
TraverseBench.traverseVectorError 10000 thrpt 20 1302.362 ± 66.015 ops/s
TraverseBench.traverse_Chain 10000 thrpt 20 3598.033 ± 108.420 ops/s
TraverseBench.traverse_List 10000 thrpt 20 4048.287 ± 158.090 ops/s
TraverseBench.traverse_Vector 10000 thrpt 20 4094.024 ± 101.115 ops/s

Vector (702ab8b9d1ebad207bb7c97eb671da333e667ea0)

Benchmark (length) Mode Cnt Score Error Units
TraverseBench.filterList 10000 thrpt 20 3305.299 ± 233.483 ops/s
TraverseBench.filterVector 10000 thrpt 20 3253.531 ± 133.993 ops/s
TraverseBench.mapList 10000 thrpt 20 3308.961 ± 138.697 ops/s
TraverseBench.mapVector 10000 thrpt 20 3911.187 ± 142.582 ops/s
TraverseBench.traverseChain 10000 thrpt 20 594.535 ± 29.777 ops/s
TraverseBench.traverseChainError 10000 thrpt 20 1084.548 ± 75.686 ops/s
TraverseBench.traverseFilterChain 10000 thrpt 20 642.895 ± 55.621 ops/s
TraverseBench.traverseFilterList 10000 thrpt 20 682.965 ± 41.234 ops/s
TraverseBench.traverseFilterVector 10000 thrpt 20 696.235 ± 59.945 ops/s
TraverseBench.traverseList 10000 thrpt 20 582.275 ± 35.877 ops/s
TraverseBench.traverseListError 10000 thrpt 20 1114.959 ± 52.641 ops/s
TraverseBench.traverseVector 10000 thrpt 20 638.235 ± 42.533 ops/s
TraverseBench.traverseVectorError 10000 thrpt 20 1098.038 ± 62.946 ops/s
TraverseBench.traverse_Chain 10000 thrpt 20 3480.086 ± 178.654 ops/s
TraverseBench.traverse_List 10000 thrpt 20 3950.650 ± 171.441 ops/s
TraverseBench.traverse_Vector 10000 thrpt 20 3695.495 ± 182.401 ops/s

TimWSpence avatar Aug 23 '23 16:08 TimWSpence

  • The TraverseFilter laws hard-code Option. How do we test that the new branching implementation is lawful?

Can we add some new laws in terms of OptionT[Eval, _]? Edit: I don't think this will work unless we make sure that OptionT offers a StackSafeMonad if F is a StackSafeMonad.

Should I open a separate PR for the new benchmarks first?

Nah.

armanbilge avatar Aug 23 '23 16:08 armanbilge

Can you post the results of the benchmarks? Apologies if I missed them.

johnynek avatar Aug 23 '23 16:08 johnynek

Can you post the results of the benchmarks? Apologies if I missed them.

Yes will do. I had originally asked whether I should add the new ones in a separate PR first which was why I didn't run them straight away

TimWSpence avatar Aug 23 '23 18:08 TimWSpence

@armanbilge @johnynek apologies it's taken me so long to get round to this. I've updated with benchmarks for the mutable builder approach. Are you still worried about lawfulness? Shall I try to do it with immutable data structures?

Those traverse_ improvements though 😆

TimWSpence avatar Nov 05 '23 16:11 TimWSpence

Thanks for working on this. What a don't see is how a Monad such as fs2.Stream[IO, A] would interact here. In that. Case you can have laziness, concurrency and more than one output per input.

Can you give an explanation as to why we shouldn't worry about mutability in such a case?

johnynek avatar Nov 08 '23 22:11 johnynek

Can you give an explanation as to why we shouldn't worry about mutability in such a case?

We can't have concurrency: we are sequencing the effects with flatMap, which by law must be sequential.

armanbilge avatar Nov 08 '23 23:11 armanbilge

We can't have concurrency: we are sequencing the effects with flatMap, which by law must be sequential.

This is definitely a colloquial understanding of Monads, that they represent an sequential computation, but I don't see the law that allows us to make the claim you want.

For instance, in the current PR we are using immutable items inside the monadic type constructor. I think the current writing will work. But if we changed those to be mutable, I don't see a proof that it will work, even though it may.

For instance, consider this type:

sealed trait Tree[F[_], A]
case class Item[F[_], A](fa: A) extends Tree[F, A]
case class InsideF[F[_], A](treeF: F[Tree[F, A]]) extends Tree[F, A]
case class Branch[A](left: Tree[F, A], right: Tree[F, A]) extends Tree[F, A]

object Tree {
  def pure[F[_], A](a: A): Tree[F, A] = Item(a)
  def flatMap[F[_]: Functor, A, B](fa: Tree[F, A])(fn: A => Tree[F, B]): Tree[F, B] =
    fa match {
      case Item(a) => fn(a)
      case InsideF(fa) => InsideF(fa.map(flatMap(_)(fn)))
      case Branch(left, right) =>
        Branch(flatMap(left)(fn), flatMap(right)(fn))
    }
}

Now, I haven't checked if this follows the monad laws as we have expressed them, but I think it or something like it could. In this picture note we are recursively calling flatMap inside the F context and outside of it. So, if F is lazy, we return before we have fully recursed.

Couldn't something like this cause a problem?

I think saying "monads are sequential" is a bit too hand-wavey to bolt into the default implementation for every user of cats. I think we need a stronger argument than that.

johnynek avatar Nov 09 '23 01:11 johnynek

Oscar is right: using mutable data structures to implement traverse will not work, but perhaps surprisingly the issue is not concurrency or laziness. I added a test in 2210d8271792a14c20b5b25b7b54b82f464779f6 that fails when tested against the builder-based implementation in a1c9ff5d02f744cc6789627877c98059e6b4b3a5. There is some cheating there to pretend that List is a stacksafe monad just so that we invoke the traverseDirectly codepath.

As it turns out, the problem is actually here: https://github.com/typelevel/cats/blob/a1c9ff5d02f744cc6789627877c98059e6b4b3a5/core/src/main/scala/cats/Traverse.scala#L299-L302

For List-like monads, you may invoke that map function multiple times for each element of the List. If the accumulator is shared mutable state then this breaks down very quickly, which is exactly what that test demonstrates.

Suffice to say: my bad 😅

armanbilge avatar Nov 10 '23 21:11 armanbilge

Thanks @johnynek and @armanbilge I'm convinced that we should use immutable data structures 😆 I've run the benchmarks for the immutable vector variant. I'll do the same for immutable list and chain - I don't have a good intuition which would be best?

Also thanks for the other comments - I'll address those too :)

TimWSpence avatar Nov 14 '23 12:11 TimWSpence

Apologies again for the delay on this. The blocker is that I keep forgetting to run the benchmarks overnight and I need my laptop during the day and it takes almost two hours to run.

I've still got to fix the Map instance but I'll do that once we've settled on an approach for the other types. Annoyingly I should probably re-do the Vector variant with the Applicative method changes so I still need to benchmark

  1. Chain (c5d90a69657bd7a0cc0a654cfbcd798ed9f18274)
  2. List (f95b0876d205f71559ef05d92cbcec8c1a700c40)
  3. Vector (todo)

TimWSpence avatar Dec 08 '23 15:12 TimWSpence

OK I've run the benchmarks for the Chain implementation and the traverse_ performance dropped significantly. But I didn't touch the implementation of traverse_... I've running these on an M2 Max. I've heard of benchmarks ending up running on its efficiency cores - could that be what happened?

Also realized that I didn't update TraverseFilter to use Chain 😭

TimWSpence avatar Dec 13 '23 14:12 TimWSpence

I've running these on an M2 Max

Screen Shot 2023-12-13 at 8 32 01 AM

Final warning — do not under any circumstances do this on Apple hardware.

https://www.youtube.com/watch?v=n5u7DgFwLGE&t=1541s

armanbilge avatar Dec 13 '23 16:12 armanbilge

I've running these on an M2 Max

Screen Shot 2023-12-13 at 8 32 01 AM

Final warning — do not under any circumstances do this on Apple hardware.

https://www.youtube.com/watch?v=n5u7DgFwLGE&t=1541s

😆 Sadly it's all I have. Any suggestions?

TimWSpence avatar Dec 14 '23 16:12 TimWSpence

OK I've now got access to a relatively chonky EC2 instance so I'll try to get the full set of benchmarks done ASAP

TimWSpence avatar Dec 18 '23 14:12 TimWSpence

OK there are the benchmarks for baseline, chain and vector implementations. The traverse_ improvements are looking pretty sweet but the traverse improvements are currently very small (especially when we have to convert between data types). I'll try to investigate in the new year but if anyone has ideas in the meantime then I'd love to hear them!

TimWSpence avatar Dec 22 '23 16:12 TimWSpence

Am I wrong that the Chain version seems to beat the Vector version?

johnynek avatar Dec 24 '23 19:12 johnynek

@armanbilge @johnynek @valencik sorry for letting this languish for so long 😭 I've now reverted the last change which was just to get some baseline benchmarks and merged main in again.

Just wanted to gauge people's thoughts on the best direction from here. Assuming that my benchmarks are reliable (if anyone else wants to run them as well, that would be fantastic!), the results are a bit disappointing for traverse (baseline, chain and vector all look relatively similar. The new versions are slightly better, especially for Chain collections with the Chain implementation and similarly for Vector) but pretty great for traverse_. Where do you think we should go from here?

TimWSpence avatar Mar 04 '24 15:03 TimWSpence

I listed a couple of minor nits, but I think any performance improvement is worth merging, especially around traverse since it is such a key function.

Thanks! Do you think it's worth making separate Chain and Vector implementations respectively or just implement them all with Chain as that seems to be the best overall?

TimWSpence avatar Mar 04 '24 22:03 TimWSpence

I would go where the benchmarks suggest if we have confidence in them.

johnynek avatar Mar 04 '24 23:03 johnynek

@valencik you foolishly volunteered yourself on Discord to run the benchmarks as well 😆 If you have the chance, I would massively appreciate another set of data to confirm that what I'm seeing is correct

TimWSpence avatar Mar 05 '24 12:03 TimWSpence

Mostly https://github.com/typelevel/cats/pull/4498/commits/803107de2f728f648800513ef5cff242a1172fbf * ( Sorry, I actually did apply suggested change in https://github.com/typelevel/cats/pull/4498#discussion_r1512844912 )

[info] Benchmark                           (length)   Mode  Cnt     Score     Error  Units
[info] TraverseBench.filterList               10000  thrpt   20  5683.590 ±  73.700  ops/s
[info] TraverseBench.filterVector             10000  thrpt   20  5665.475 ±  34.436  ops/s
[info] TraverseBench.mapList                  10000  thrpt   20  5833.001 ±  45.470  ops/s
[info] TraverseBench.mapVector                10000  thrpt   20  6206.558 ±  45.317  ops/s
[info] TraverseBench.traverseChain            10000  thrpt   20  1157.471 ±   8.626  ops/s
[info] TraverseBench.traverseChainError       10000  thrpt   20  2080.678 ±  15.341  ops/s
[info] TraverseBench.traverseFilterChain      10000  thrpt   20  1201.729 ±   6.121  ops/s
[info] TraverseBench.traverseFilterList       10000  thrpt   20  1143.251 ±   8.666  ops/s
[info] TraverseBench.traverseFilterVector     10000  thrpt   20  1232.990 ±   7.090  ops/s
[info] TraverseBench.traverseList             10000  thrpt   20  1085.565 ±   8.561  ops/s
[info] TraverseBench.traverseListError        10000  thrpt   20  2192.903 ±   9.450  ops/s
[info] TraverseBench.traverseVector           10000  thrpt   20  1162.033 ±  22.101  ops/s
[info] TraverseBench.traverseVectorError      10000  thrpt   20  2178.812 ±  22.956  ops/s
[info] TraverseBench.traverse_Chain           10000  thrpt   20  7644.606 ± 151.222  ops/s
[info] TraverseBench.traverse_List            10000  thrpt   20  8946.374 ± 124.209  ops/s
[info] TraverseBench.traverse_Vector          10000  thrpt   20  9036.810 ±  37.314  ops/s

and against main (https://github.com/typelevel/cats/commit/d323facbfcfac7803ed85f75e16e302f68c4ad45) but with TraverseBench.scala from https://github.com/typelevel/cats/pull/4498/commits/803107de2f728f648800513ef5cff242a1172fbf:

[info] Benchmark                           (length)   Mode  Cnt     Score     Error  Units
[info] TraverseBench.filterList               10000  thrpt   20  5781.907 ±  31.753  ops/s
[info] TraverseBench.filterVector             10000  thrpt   20  5667.024 ±  53.352  ops/s
[info] TraverseBench.mapList                  10000  thrpt   20  5864.307 ±  45.807  ops/s
[info] TraverseBench.mapVector                10000  thrpt   20  6245.999 ±  46.899  ops/s
[info] TraverseBench.traverseChain            10000  thrpt   20  1072.360 ±  25.156  ops/s
[info] TraverseBench.traverseChainError       10000  thrpt   20  2716.055 ± 100.545  ops/s
[info] TraverseBench.traverseFilterChain      10000  thrpt   20  1029.492 ±  51.776  ops/s
[info] TraverseBench.traverseFilterList       10000  thrpt   20   928.622 ±   3.589  ops/s
[info] TraverseBench.traverseFilterVector     10000  thrpt   20  1002.081 ±  44.357  ops/s
[info] TraverseBench.traverseList             10000  thrpt   20   948.305 ±  26.980  ops/s
[info] TraverseBench.traverseListError        10000  thrpt   20  2722.420 ±  77.210  ops/s
[info] TraverseBench.traverseVector           10000  thrpt   20  1011.813 ±  19.254  ops/s
[info] TraverseBench.traverseVectorError      10000  thrpt   20  3366.467 ±  35.241  ops/s
[info] TraverseBench.traverse_Chain           10000  thrpt   20   876.282 ± 100.385  ops/s
[info] TraverseBench.traverse_List            10000  thrpt   20   779.133 ±  14.965  ops/s
[info] TraverseBench.traverse_Vector          10000  thrpt   20  1093.288 ±  45.168  ops/s

By the looks of things filterList, filterVector, mapList, and mapVector are all within errror when comparing this PR against latest main. We have small improvements for traverseChain, traverseFilterChain, traverseFilterList, traverseFilterVector, traverseList, and traverseVector. Medium regressions for traverseChainError, traverseListError, traverseVectorError. And then very large improvements for traverse_Chain, traverse_List, and traverse_Vector.

Summarized differently:

🏎️ traverse_ 🏃 traverse 🏃 traverseFilter 🆗 filter 🆗 map 🐢 traverseError

valencik avatar Mar 06 '24 13:03 valencik

Wanted to confirm things, so I ran again and came up with the same results.

I ran these benchmarks (both previously and here) with the following in sbt: bench/jmh:run -i 10 -wi 10 -f 2 -t 1 cats.bench.TraverseBench

And with the following jvmopts: -Xms2G -Xmx12G

Details

on exactly https://github.com/typelevel/cats/pull/4498/commits/803107de2f728f648800513ef5cff242a1172fbf

[info] Benchmark                           (length)   Mode  Cnt     Score    Error  Units
[info] TraverseBench.filterList               10000  thrpt   20  5715.626 ±  29.321  ops/s
[info] TraverseBench.filterVector             10000  thrpt   20  5546.558 ±  19.276  ops/s
[info] TraverseBench.mapList                  10000  thrpt   20  5767.038 ±  34.619  ops/s
[info] TraverseBench.mapVector                10000  thrpt   20  6173.065 ±  40.277  ops/s
[info] TraverseBench.traverseChain            10000  thrpt   20  1123.427 ±   4.987  ops/s
[info] TraverseBench.traverseChainError       10000  thrpt   20  2127.662 ±  13.144  ops/s
[info] TraverseBench.traverseFilterChain      10000  thrpt   20  1180.463 ±   5.585  ops/s
[info] TraverseBench.traverseFilterList       10000  thrpt   20  1206.215 ±   9.615  ops/s
[info] TraverseBench.traverseFilterVector     10000  thrpt   20  1244.894 ±  25.955  ops/s
[info] TraverseBench.traverseList             10000  thrpt   20  1069.802 ±  13.869  ops/s
[info] TraverseBench.traverseListError        10000  thrpt   20  2167.488 ±  21.981  ops/s
[info] TraverseBench.traverseVector           10000  thrpt   20  1152.973 ±  25.078  ops/s
[info] TraverseBench.traverseVectorError      10000  thrpt   20  2207.517 ±  33.643  ops/s
[info] TraverseBench.traverse_Chain           10000  thrpt   20  7912.984 ± 107.098  ops/s
[info] TraverseBench.traverse_List            10000  thrpt   20  8934.534 ±  46.916  ops/s
[info] TraverseBench.traverse_Vector          10000  thrpt   20  9092.209 ±  36.689  ops/s

and on main:

[info] Benchmark                           (length)   Mode  Cnt     Score    Error  Units
[info] TraverseBench.filterList               10000  thrpt   20  5710.935 ± 65.052  ops/s
[info] TraverseBench.filterVector             10000  thrpt   20  5586.684 ± 53.060  ops/s
[info] TraverseBench.mapList                  10000  thrpt   20  5794.559 ± 31.269  ops/s
[info] TraverseBench.mapVector                10000  thrpt   20  6156.259 ± 29.844  ops/s
[info] TraverseBench.traverseChain            10000  thrpt   20  1092.330 ±  7.171  ops/s
[info] TraverseBench.traverseChainError       10000  thrpt   20  2713.055 ± 41.359  ops/s
[info] TraverseBench.traverseFilterChain      10000  thrpt   20   982.057 ±  6.179  ops/s
[info] TraverseBench.traverseFilterList       10000  thrpt   20   873.508 ± 11.078  ops/s
[info] TraverseBench.traverseFilterVector     10000  thrpt   20   986.796 ± 41.753  ops/s
[info] TraverseBench.traverseList             10000  thrpt   20   918.397 ± 39.014  ops/s
[info] TraverseBench.traverseListError        10000  thrpt   20  2781.320 ± 15.372  ops/s
[info] TraverseBench.traverseVector           10000  thrpt   20   997.688 ±  4.798  ops/s
[info] TraverseBench.traverseVectorError      10000  thrpt   20  3081.201 ± 31.311  ops/s
[info] TraverseBench.traverse_Chain           10000  thrpt   20   939.052 ± 20.897  ops/s
[info] TraverseBench.traverse_List            10000  thrpt   20   808.780 ±  8.011  ops/s
[info] TraverseBench.traverse_Vector          10000  thrpt   20  1111.320 ±  9.188  ops/s

valencik avatar Mar 06 '24 23:03 valencik

@valencik absolute legend ❤️ Couple of points:

  1. I'm dumb and not thinking straight 😅 I was thinking the Map implementation would be dependent on either Chain or Vector and so I was waiting for the outcome of that debate. I'll fix it now
  2. map and filter should be unchanged by this so all of the results apart from traverseError are expected, right? It's certainly not obvious to me why traverseError is worse now...
  3. Do you think it's worth having separate implementations for Vector and Chain? From my results they seemed to be noticeably better when we're not doing a conversion from one to the other

TimWSpence avatar Mar 07 '24 17:03 TimWSpence

@valencik sorry to bother you. Just tagging you in case you missed my previous comment

TimWSpence avatar Mar 18 '24 10:03 TimWSpence

2. map and filter should be unchanged by this so all of the results apart from traverseError are expected, right? It's certainly not obvious to me why traverseError is worse now...

I agree, and similarly, I also do not know why traverseError would be worse now.

3. Do you think it's worth having separate implementations for Vector and Chain? From my results they seemed to be noticeably better when we're not doing a conversion from one to the other

This almost seems like followup work to me? I think, if we are fine with the traverseError regression, we should merge this work (or fix traverseError and then merge). This already is only one part of https://github.com/typelevel/cats/issues/4408, we can have incremental improvements follow it.

valencik avatar Mar 20 '24 23:03 valencik

  1. map and filter should be unchanged by this so all of the results apart from traverseError are expected, right? It's certainly not obvious to me why traverseError is worse now...

I agree, and similarly, I also do not know why traverseError would be worse now.

  1. Do you think it's worth having separate implementations for Vector and Chain? From my results they seemed to be noticeably better when we're not doing a conversion from one to the other

This almost seems like followup work to me? I think, if we are fine with the traverseError regression, we should merge this work (or fix traverseError and then merge). This already is only one part of #4408, we can have incremental improvements follow it.

Yeah I'm more than happy to draw a line under this PR and follow up separately - it's been open far too long 😆 I'll move it out of draft now

TimWSpence avatar Mar 21 '24 16:03 TimWSpence

Argh @valencik I've just realized the current HEAD uses Vector rather than Chain. From what I saw, Chain seemed to be better overall. Shall I switch it back? I think that was @johnynek's view as well?

TimWSpence avatar Mar 22 '24 10:03 TimWSpence

Argh @valencik I've just realized the current HEAD uses Vector rather than Chain. From what I saw, Chain seemed to be better overall. Shall I switch it back? I think that was @johnynek's view as well?

Just to double check I understand, do you mean the implementation of traverseDirectly?

Yeah, let's switch that to Chain and I can rerun benchmarks.

valencik avatar Mar 22 '24 12:03 valencik

Currently running benchmarks again on a tweak which uses Chain in traverseDirectly and traverseFilterDirectly https://github.com/typelevel/cats/compare/803107de2f728f648800513ef5cff242a1172fbf...valencik:cats:more-chain?expand=1

valencik avatar Mar 22 '24 13:03 valencik

Ok, I ran the benchmarks, I think Chain is better, and put up a PR to your branch/fork @TimWSpence https://github.com/TimWSpence/cats/pull/2

Feel free to leave it for a day or two. There's a ton of numbers here. I wouldn't be against taking another look over things.

I also ran a test benchmark run with a traverseDirectlyVector variant, and I think that shows some mild improvements. But again, maybe we should save that for a follow up.

valencik avatar Mar 22 '24 23:03 valencik