containers icon indicating copy to clipboard operation
containers copied to clipboard

More efficient folds for Seq

Open meooow25 opened this issue 1 year ago • 6 comments

We can define folds for Seq more efficiently. See thread starting https://github.com/haskell/containers/issues/1016#issuecomment-2336465956 and in particular the code by treeowl in https://github.com/haskell/containers/issues/1016#issuecomment-2337033674

The catch is that we need GADTs, which is not portable.

meooow25 avatar Sep 10 '24 17:09 meooow25

Might be worth noting that we already use a couple of extensions in a non-portable way

  1. CPP - Primarily to guard GHC features, including other extensions, or provide faster implementations on GHC
  2. BangPatterns - Compared to the portable alternative (seq), this is just so much nicer to use and read.

meooow25 avatar Oct 13 '24 17:10 meooow25

I started working on this today. The main difficulty I'm having is finding a good (not too awkward) way to get Sized evidence when we need it—which is sometimes a step earlier than it would naturally appear. In particular, it seems natural to have

foldMapWithIndexFT :: Depth t -> ... -> t -> m

When t ~ Node a, we very quickly need Sized a, which comes from pattern matching a second time on the depth. I suppose we could probably just maintain the current split between Node and Elem functions (which we currently do for performance), showing is to pass in Depth a in that context, by it feels a bit weird.

treeowl avatar Dec 06 '24 00:12 treeowl

Hmmm ... Even that split leaves us with similar awkwardness. Bleh.

treeowl avatar Dec 06 '24 00:12 treeowl

Okay, I think I've worked through the Sized issue reasonably well. Now I'm left with a pile of judgement calls about FingerTree vs. Seq, and various inlining issues. We can define Seq folds in terms of FingerTree folds, or vice versa, or neither. It seems "natural" to define Seq folds in terms of FingerTree folds, but:

  1. The folds are large, so GHC isn't eager to inline them. I think it's likely good to make GHC.Exts.inline effective on them if users want that, but we should do some benchmarking to see how much that's likely to matter.
  2. @augustss's MicroHs doesn't have a safe coercion mechanism, so if we do that, Seq folds will presumably pay a performance penalty there.

Related questions: Do we write other folds out in full, or define them using inline foldMap? What inlining pragmas should we put where to get the best behavior by default (probably just specialization)?

treeowl avatar Dec 10 '24 18:12 treeowl

The folds we currently have are implemented directly, and generally marked INLINE.

I added some fold benchmarks in #1068, I hope they're useful in evaluating the new implementation including whether or not pragmas help.

Performance on MicroHs is a whole another story, if we want to go there. I think there are bigger problems than coercions. For instance, going by the README, it doesn't fully implement BangPatterns.

The BangPatterns extension is parsed, but only effective at the a top level let/where.

We also don't have a way to measure performance, since we need tasty and tasty-bench to be compatible with MicroHs first.

meooow25 avatar Dec 14 '24 13:12 meooow25

The folds for digits and nodes can inline into the other folds. But GHC can't inline the folds going down FingerTree spines. With the GADTy version, that changes. We can pull out the now-static function argument, allowing inlining (and not just specialization) to occur. Whether we should is another question. That likely has its own performance cost in the usual case where it doesn't inline. I'm going to be offline most of the next week, but I'll try to put up some of the code I've written.

treeowl avatar Dec 14 '24 16:12 treeowl