type-level-sets
type-level-sets copied to clipboard
Compilation performance
I have been trying to compute the union of 2 sets of 9 elements each and ... well the compiler is still running :-). Have you or anyone else investigating ways to speed up those type level computations?
Could you post the example? I haven't had any problems with this before. It shouldn't take that long!
I am using the following Cmp
instance to compare types deriving Generic
type family TypeName a :: Symbol where
TypeName Double = "Double"
TypeName Int = "Int"
TypeName String = "String"
TypeName (M1 D ('MetaData name _ _ _) f ()) = name
TypeName a = TypeName (Rep a ())
type family Cmp (a :: k) (b :: k) :: Ordering where
Cmp a b = CmpSymbol (TypeName a) (TypeName b)
And here is some code which takes a very long time to compile:
import GHC.Generics
import Prelude
import Data.Type.Set
newtype X0 = X0 Int deriving (Generic)
newtype X1 = X1 Int deriving (Generic)
newtype X2 = X2 Int deriving (Generic)
newtype X3 = X3 Int deriving (Generic)
newtype X4 = X4 Int deriving (Generic)
newtype X5 = X5 Int deriving (Generic)
newtype X6 = X6 Int deriving (Generic)
newtype X7 = X7 Int deriving (Generic)
newtype X8 = X8 Int deriving (Generic)
newtype X9 = X9 Int deriving (Generic)
newtype X10 = X10 Int deriving (Generic)
newtype X11 = X11 Int deriving (Generic)
newtype X12 = X12 Int deriving (Generic)
newtype X13 = X13 Int deriving (Generic)
newtype X14 = X14 Int deriving (Generic)
newtype X15 = X15 Int deriving (Generic)
newtype X16 = X16 Int deriving (Generic)
newtype X17 = X17 Int deriving (Generic)
newtype X18 = X18 Int deriving (Generic)
newtype X19 = X19 Int deriving (Generic)
type R0_to_9 = '[
X0
, X1
, X2
, X3
, X4
, X5
, X6
, X7
, X8
, X9
]
type R10_to_19 = '[
X10
, X11
, X12
, X13
, X14
, X15
, X16
, X17
, X18
, X19
]
type R0_to_19 = '[
X0
, X1
, X2
, X3
, X4
, X5
, X6
, X7
, X8
, X9
, X10
, X11
, X12
, X13
, X14
, X15
, X16
, X17
, X18
, X19
]
rX_0_9 :: Set R0_to_9
rX_0_9 =
s (X0 1) `append`
s (X1 1) `append`
s (X2 1) `append`
s (X3 1) `append`
s (X4 1) `append`
s (X5 1) `append`
s (X6 1) `append`
s (X7 1) `append`
s (X8 1) `append`
s (X9 1)
rX_10_19 :: Set R10_to_19
rX_10_19 =
s (X10 1) `append`
s (X11 1) `append`
s (X12 1) `append`
s (X13 1) `append`
s (X14 1) `append`
s (X15 1) `append`
s (X16 1) `append`
s (X17 1) `append`
s (X18 1) `append`
s (X19 1)
s :: a -> Set '[a]
s a = Ext a Empty
rx_0_18 :: Set R0_to_19
rx_0_18 = rX_0_9 `union` rX_10_19
Maybe you have a better idea for the Cmp
instance which would make things faster to compile but I suspect that this is only the typelevel computations which are creating too big types. For the record I have more than an intellectual interest in the subject, I am actually planning to use this type-level union of types for a dependency-injection library :-).
Ouch this IS slow. I will look into it when I can. I tried this version but I used an instance of Cmp
rather than redefining the whole family, i.e.
type instance Cmp a b =
CmpSymbol (TypeName a) (TypeName b)
Is that ok?
Is there a "canonical" way to compare types based on their Typeable
instance? This is what I would like to have eventually.
And thanks for having a look. My impression is that short of having compiler support for type-level sets we will always be quite slow but I hope I'm wrong (I have to put up to 100 elements in such a structure).
As a quick way to see how the type family is being evaluated, you can do:
ghc-options: -ddump-tc-trace -ddump-to-file -dsupress-all
Search for *.ddump-tc-trace
inside your .stack-work
or dist
folders.
EDIT: The file gets very big very fast. You may want to terminate the build and just inspect it. I suggest you use a terminal editor like vim/nano since they can handle crazy big files well.
Thanks for the tip, what kind of insights should I be after when reading such a file? Do you think I'd be able to spot some alternative way of doing typelevel sets? My impression is that this is all slow because we are doing a typelevel quicksort after all!
Main thing is to grep for a particular type function to see what arguments it's being applied to.
It turns out the implementation of Filter
in this library takes exponential time and not linear because If
isn't lazy since it's a type-level function and type functions are strict. Hence it's evaluating both branches of the If
before selecting on the condition.
I did fix this in a local branch (adding a lot of boilerplate) and it ran much faster but I ended up getting a type error with your example program for some reason. Need to take a closer look.
When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.
Thanks a lot @rahulmutt for having a look at this, you must be otherwise terribly busy!
When you say you need type-level sets, do you just need sets at the type level only or you need the term level to reflect the type level (like this library does). I think if you clearly specify the requirements for your particular use-case, it'll be easier to help you.
I just need sets at the type level in my registry
library to track the inputs and outputs of functions stored in the Registry. When I use registry
for assembling Hedgehog generators I have a case where I have approximately 250 types to track but only 140 once deduplicated. And checking that I can get an element out of the registry takes around 15 to 20 seconds. If I could reduce that time I would do a compile-check in my code instead of discovering that a generator is missing at runtime.
@etorreborre The following code compiles in about 2 seconds on my machine:
https://gist.github.com/rahulmutt/98a12cb3eec9fe800081a627a8314526
It uses insertion sort which should be OK for small lists - you probably won't be dealing with type-level lists large than, say 1000. Let me know if you need it to run even faster.
Thanks @rahulmutt I wanted to try your solution at my real life solution but I'm bumping into another issue which I also did not know how to solve. This is a closed type family
type family TypeName a :: Symbol where
TypeName Int = "Int"
TypeName Text = "Text"
TypeName (M1 D ('MetaData name _ _ _) f ()) = name
TypeName a = TypeName (Rep a ())
but I don't know in advance all the types I am going to stick in my registry. Also I'd rather avoid imposing that people derive Generic
or anything else on the components they put in the registry. I remember searching for a solution for this but it didn't seem that GHC gives us any handle here. Which is quite weird because it should have a handle of the symbol name corresponding to a type, right?
It really doesn't seem to be possible, otherwise we wouldn't have https://hackage.haskell.org/package/compare-type-0.1.1/docs/Type-Compare.html :-(.
TypeName
can be open - the algorithm does not depend on its closed-ness. The user will have to specify an instance for TypeName
for every input/output type they put into a registry OR make it derive Generic.
As of now, it's not possible to get a Symbol from an arbitrary type without having that type having deriving Generic in the first place. The only way you can get a canonical ordering of all types is if GHC supported reifying the TypeRep
fingerprint to the type-level as a Nat
or if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types.
The user will have to specify an instance for TypeName for every input/output type they put into a registry OR make it derive Generic
Unfortunately that wouldn't be very practical because some types come from libraries and my intention is to reduce boilerplate as much as possible. Maybe this means a trade-off between ease-of-use and compilation times in some cases.
if it supported a way to get type-level metadata about a type without deriving Generic using a built-in type family whose instances are supplied by the compiler itself using its deep knowledge of types
I doubt that the GHC team is willing to support such an obscure use case. Maybe if I can find other people needing sorting of arbitrary type lists we could make a collective case for it?
I just realized that I would have other issues to solve even if I could solve that one: error messages! I would to a nicer representation of the type level sets otherwise errors would look like:
No instance for (IsSubset
(Nub
(SortHelper
(Nub
(SortHelper
('[Int]
:++ Nub
(SortHelper2
(NotEqual
(GHC.TypeLits.CmpSymbol
(TypeName (Rep Text1 ())) "Int")
'GT)
Text1
'[]
Int
'[Text]))
'[]))
'[]))
(Nub
(SortHelper
('[Int]
:++ Nub
(SortHelper
('[Text]
:++ Nub
(SortHelper2
(NotEqual
(GHC.TypeLits.CmpSymbol
(TypeName (Rep Text2 ()))
(TypeName (Rep Text1 ())))
'GT)
Text2
'[]
Text1
'[]))
'[]))
'[])))
Not great :-)
Actually @rahulmutt I tried something really simple which seems to work ok and here for a small test.
On my big project I observe that the cost of de-duplicating the type lists is amortized after the first to extract something from the registry, so that's a win! (plus I can get nicer lists of types).
This is actually so simple that I'm wondering what I doing wrong :-).
That's a great, simple solution. Is it still OK that two lists contain the same elements but are not considered the same?
Following along the lines of your solution that avoids type-level comparisons, I've updated the Gist (see Main2.hs
) which implements efficient tail-recursive type families that compute whether two lists of types are equal & union of two de-deduplicated lists only assuming we have the ability to check for type equality.
Is it still OK that two lists contain the same elements but are not considered the same?
In my context yes.
I've updated the Gist (see
Main2.hs
)
Nice! I also did some improvements for my specific problem by using TemplateHaskell) because after all with Template Haskell I can do a bit of type-checking and then emit a simplified expression without duplicate types. And now my compilation times are even faster.
Is there a way to get linear performance out of Filter
? The examples in https://blog.josephmorag.com/posts/databass3/ take a really really long time to compile as well. I haven't worked out the interaction with the FilterV
class, but would changing the Filter
type family to not use If
recover laziness and run in O(n)?
type family Filter (f :: Flag) (p :: k) (xs :: [k]) :: [k] where
Filter f p '[] = '[]
Filter f p (x ': xs) = Filter' f p x xs (Cmp x p)
type family Filter' (f :: Flag) (p :: k) (x :: k) (xs :: [k]) cmp where
Filter' FMin p x xs LT = x ': Filter FMin p xs
Filter' FMin p x xs eqOrGt = Filter FMin p xs
Filter' FMax p x xs LT = Filter FMax p xs
Filter' FMax p x xs eqOrGt = x ': Filter FMax p xs
Hm, apparently there's no way to reason about type family time complexity. I will investigate further.