type-level-sets Compilation performance

I have been trying to compute the union of 2 sets of 9 elements each and ... well the compiler is still running :-). Have you or anyone else investigating ways to speed up those type level computations?

Jun 01 '18 22:06 etorreborre

Could you post the example? I haven't had any problems with this before. It shouldn't take that long!

Jun 01 '18 22:06 dorchard

I am using the following Cmp instance to compare types deriving Generic

type family TypeName a :: Symbol where
  TypeName Double = "Double"
  TypeName Int = "Int"
  TypeName String = "String"
  TypeName (M1 D ('MetaData name _ _ _) f ()) = name
  TypeName a = TypeName (Rep a ())

type family Cmp (a :: k) (b :: k) :: Ordering where
  Cmp a b = CmpSymbol (TypeName a) (TypeName b)

And here is some code which takes a very long time to compile:

import           GHC.Generics
import           Prelude
import           Data.Type.Set

newtype X0  = X0  Int deriving (Generic)
newtype X1  = X1  Int deriving (Generic)
newtype X2  = X2  Int deriving (Generic)
newtype X3  = X3  Int deriving (Generic)
newtype X4  = X4  Int deriving (Generic)
newtype X5  = X5  Int deriving (Generic)
newtype X6  = X6  Int deriving (Generic)
newtype X7  = X7  Int deriving (Generic)
newtype X8  = X8  Int deriving (Generic)
newtype X9  = X9  Int deriving (Generic)
newtype X10 = X10 Int deriving (Generic)
newtype X11 = X11 Int deriving (Generic)
newtype X12 = X12 Int deriving (Generic)
newtype X13 = X13 Int deriving (Generic)
newtype X14 = X14 Int deriving (Generic)
newtype X15 = X15 Int deriving (Generic)
newtype X16 = X16 Int deriving (Generic)
newtype X17 = X17 Int deriving (Generic)
newtype X18 = X18 Int deriving (Generic)
newtype X19 = X19 Int deriving (Generic)

type R0_to_9 = '[
    X0
  , X1
  , X2
  , X3
  , X4
  , X5
  , X6
  , X7
  , X8
  , X9
  ]

type R10_to_19 = '[
    X10
  , X11
  , X12
  , X13
  , X14
  , X15
  , X16
  , X17
  , X18
  , X19
  ]

type R0_to_19 = '[
    X0
  , X1
  , X2
  , X3
  , X4
  , X5
  , X6
  , X7
  , X8
  , X9
  , X10
  , X11
  , X12
  , X13
  , X14
  , X15
  , X16
  , X17
  , X18
  , X19
  ]

rX_0_9 :: Set R0_to_9
rX_0_9 =
  s (X0 1) `append`
  s (X1 1) `append`
  s (X2 1) `append`
  s (X3 1) `append`
  s (X4 1) `append`
  s (X5 1) `append`
  s (X6 1) `append`
  s (X7 1) `append`
  s (X8 1) `append`
  s (X9 1)

rX_10_19 :: Set R10_to_19
rX_10_19 =
  s (X10 1) `append`
  s (X11 1) `append`
  s (X12 1) `append`
  s (X13 1) `append`
  s (X14 1) `append`
  s (X15 1) `append`
  s (X16 1) `append`
  s (X17 1) `append`
  s (X18 1) `append`
  s (X19 1)

s :: a -> Set '[a]
s a = Ext a Empty

rx_0_18 :: Set R0_to_19
rx_0_18 = rX_0_9 `union` rX_10_19

Maybe you have a better idea for the Cmp instance which would make things faster to compile but I suspect that this is only the typelevel computations which are creating too big types. For the record I have more than an intellectual interest in the subject, I am actually planning to use this type-level union of types for a dependency-injection library :-).

Jun 03 '18 17:06 etorreborre

Ouch this IS slow. I will look into it when I can. I tried this version but I used an instance of Cmp rather than redefining the whole family, i.e.

type instance Cmp a b =
  CmpSymbol (TypeName a) (TypeName b)

Is that ok?

Nov 29 '18 11:11 dorchard

Is there a "canonical" way to compare types based on their Typeable instance? This is what I would like to have eventually.

Nov 29 '18 13:11 etorreborre

And thanks for having a look. My impression is that short of having compiler support for type-level sets we will always be quite slow but I hope I'm wrong (I have to put up to 100 elements in such a structure).

Nov 29 '18 13:11 etorreborre

As a quick way to see how the type family is being evaluated, you can do:

ghc-options: -ddump-tc-trace -ddump-to-file -dsupress-all

Search for *.ddump-tc-trace inside your .stack-work or dist folders.

EDIT: The file gets very big very fast. You may want to terminate the build and just inspect it. I suggest you use a terminal editor like vim/nano since they can handle crazy big files well.

Apr 03 '19 13:04 rahulmutt

Thanks for the tip, what kind of insights should I be after when reading such a file? Do you think I'd be able to spot some alternative way of doing typelevel sets? My impression is that this is all slow because we are doing a typelevel quicksort after all!

Apr 03 '19 14:04 etorreborre

Main thing is to grep for a particular type function to see what arguments it's being applied to.

It turns out the implementation of Filter in this library takes exponential time and not linear because If isn't lazy since it's a type-level function and type functions are strict. Hence it's evaluating both branches of the If before selecting on the condition.

I did fix this in a local branch (adding a lot of boilerplate) and it ran much faster but I ended up getting a type error with your example program for some reason. Need to take a closer look.

When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.

Apr 04 '19 12:04 rahulmutt

Thanks a lot @rahulmutt for having a look at this, you must be otherwise terribly busy!

When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.

I just need sets at the type level in my registry library to track the inputs and outputs of functions stored in the Registry. When I use registry for assembling Hedgehog generators I have a case where I have approximately 250 types to track but only 140 once deduplicated. And checking that I can get an element out of the registry takes around 15 to 20 seconds. If I could reduce that time I would do a compile-check in my code instead of discovering that a generator is missing at runtime.

Apr 04 '19 15:04 etorreborre

@etorreborre The following code compiles in about 2 seconds on my machine:

https://gist.github.com/rahulmutt/98a12cb3eec9fe800081a627a8314526

It uses insertion sort which should be OK for small lists - you probably won't be dealing with type-level lists large than, say 1000. Let me know if you need it to run even faster.

Apr 04 '19 17:04 rahulmutt

Thanks @rahulmutt I wanted to try your solution at my real life solution but I'm bumping into another issue which I also did not know how to solve. This is a closed type family

type family TypeName a :: Symbol where
  TypeName Int = "Int"
  TypeName Text = "Text"
  TypeName (M1 D ('MetaData name _ _ _) f ()) = name
  TypeName a = TypeName (Rep a ())

but I don't know in advance all the types I am going to stick in my registry. Also I'd rather avoid imposing that people derive Generic or anything else on the components they put in the registry. I remember searching for a solution for this but it didn't seem that GHC gives us any handle here. Which is quite weird because it should have a handle of the symbol name corresponding to a type, right?

Apr 05 '19 08:04 etorreborre

It really doesn't seem to be possible, otherwise we wouldn't have https://hackage.haskell.org/package/compare-type-0.1.1/docs/Type-Compare.html :-(.

Apr 05 '19 08:04 etorreborre

TypeName can be open - the algorithm does not depend on its closed-ness. The user will have to specify an instance for TypeName for every input/output type they put into a registry OR make it derive Generic.

As of now, it's not possible to get a Symbol from an arbitrary type without having that type having deriving Generic in the first place. The only way you can get a canonical ordering of all types is if GHC supported reifying the TypeRep fingerprint to the type-level as a Nat or if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types.

Apr 05 '19 08:04 rahulmutt

The user will have to specify an instance for TypeName for every input/output type they put into a registry OR make it derive Generic

Unfortunately that wouldn't be very practical because some types come from libraries and my intention is to reduce boilerplate as much as possible. Maybe this means a trade-off between ease-of-use and compilation times in some cases.

if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types

I doubt that the GHC team is willing to support such an obscure use case. Maybe if I can find other people needing sorting of arbitrary type lists we could make a collective case for it?

Apr 05 '19 09:04 etorreborre

I just realized that I would have other issues to solve even if I could solve that one: error messages! I would to a nicer representation of the type level sets otherwise errors would look like:

No instance for (IsSubset
                         (Nub
                            (SortHelper
                               (Nub
                                  (SortHelper
                                     ('[Int]
                                      :++ Nub
                                            (SortHelper2
                                               (NotEqual
                                                  (GHC.TypeLits.CmpSymbol
                                                     (TypeName (Rep Text1 ())) "Int")
                                                  'GT)
                                               Text1
                                               '[]
                                               Int
                                               '[Text]))
                                     '[]))
                               '[]))
                         (Nub
                            (SortHelper
                               ('[Int]
                                :++ Nub
                                      (SortHelper
                                         ('[Text]
                                          :++ Nub
                                                (SortHelper2
                                                   (NotEqual
                                                      (GHC.TypeLits.CmpSymbol
                                                         (TypeName (Rep Text2 ()))
                                                         (TypeName (Rep Text1 ())))
                                                      'GT)
                                                   Text2
                                                   '[]
                                                   Text1
                                                   '[]))
                                         '[]))
                               '[])))

Not great :-)

Apr 05 '19 09:04 etorreborre

Actually @rahulmutt I tried something really simple which seems to work ok and here for a small test.

On my big project I observe that the cost of de-duplicating the type lists is amortized after the first to extract something from the registry, so that's a win! (plus I can get nicer lists of types).

This is actually so simple that I'm wondering what I doing wrong :-).

Apr 05 '19 13:04 etorreborre

That's a great, simple solution. Is it still OK that two lists contain the same elements but are not considered the same?

Following along the lines of your solution that avoids type-level comparisons, I've updated the Gist (see Main2.hs) which implements efficient tail-recursive type families that compute whether two lists of types are equal & union of two de-deduplicated lists only assuming we have the ability to check for type equality.

Apr 09 '19 09:04 rahulmutt

Is it still OK that two lists contain the same elements but are not considered the same?

In my context yes.

I've updated the Gist (see Main2.hs)

Nice! I also did some improvements for my specific problem by using TemplateHaskell) because after all with Template Haskell I can do a bit of type-checking and then emit a simplified expression without duplicate types. And now my compilation times are even faster.

Apr 09 '19 13:04 etorreborre

Is there a way to get linear performance out of Filter? The examples in https://blog.josephmorag.com/posts/databass3/ take a really really long time to compile as well. I haven't worked out the interaction with the FilterV class, but would changing the Filter type family to not use If recover laziness and run in O(n)?

type family Filter (f :: Flag) (p :: k) (xs :: [k]) :: [k] where
            Filter f p '[]       = '[]
            Filter f p (x ': xs) = Filter' f p x xs (Cmp x p)

type family Filter' (f :: Flag) (p :: k) (x :: k) (xs :: [k]) cmp where
            Filter' FMin p x xs LT = x ': Filter FMin p xs
            Filter' FMin p x xs eqOrGt = Filter FMin p xs
            Filter' FMax p x xs LT = Filter FMax p xs
            Filter' FMax p x xs eqOrGt = x ': Filter FMax p xs

Jan 21 '22 18:01 jmorag

Hm, apparently there's no way to reason about type family time complexity. I will investigate further.

Jan 21 '22 18:01 jmorag

type-level-sets type-level-sets copied to clipboard

Compilation performance

type-level-sets
type-level-sets copied to clipboard