haskell-language-server
haskell-language-server copied to clipboard
Storing dependencies contributes significantly to memory usage
I ran a profile after loading GHC into HLS and see that many of the large sources of allocation are due to to big lists of keys.
Total: 1.8 million allocated key values for 90MB (10%) of live data
hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key:89794800:1870725:48:48.0
1.4 million, 34MB of lists containing keys
ghc-prim:GHC.Types::[hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key,ghc-prim:GHC.Types::]:34671936:1444664:24:24.0
1.2 million, 29MB of the GetModSummaryWithoutTimestamps Key
ghc-prim:GHC.Tuple:(,)[ghcide-1.7.0.1-inplace:Development.IDE.Core.RuleTypes:GetModSummaryWithoutTimestamps,lsp-types-1.4.0.1-bc66547fb74f5fe287e9850a7cf5b08fad3aa0baf55d7ffc8b91807e767ee251:Language.LSP.Types.Uri:NormalizedFilePath]:29105232:1212718:24:24.0
These do not look like reverse dependencies to me. Reverse deps are stored in a HashSet:
https://github.com/haskell/haskell-language-server/blob/30d48ed705e929e4ff7da16b00ca4946ea850407/hls-graph/src/Development/IDE/Graph/Internal/Types.hs#L99-L102
More generally, the space usage of build keys is something that I noticed a while ago. I tried to maximising sharing of NormalizedFilePath values by hashconsing them here:
https://github.com/haskell/lsp/pull/340
But then decided to revert that change since it has its own set of problems:
https://github.com/haskell/lsp/pull/344
And instead apply a more localised fix for the worst offender:
https://github.com/haskell/haskell-language-server/pull/1996
Question: are the space usage stats produced by ghc-debug aware of sharing?
Sorry, these aren't reverse dependencies but the direct dependencies stored in ResultDeps.
Question: are the space usage stats produced by ghc-debug aware of sharing?
Yes, I believe so.
I think in this case a large part of the problem is that Key values aren't shared, because toKey allocates a new tuple on every call, even though the total number of distinct keys is about ~15,000 in this example.
Also, the definition of Key as data Key = forall a . (Typeable a, Eq a, Hashable a, Show a) => Key a means that each Key contructor needs to store 5 pointers (4 to the class dictionaries). This could possibly be reduced to 1 pointer for the class dictionaries if we had
class (Typeable a, ...) => C a
instance (Typeable a, ...) => C a
data Key a = forall a. C a => Key a
Is this still a problem?