haskell-language-server icon indicating copy to clipboard operation
haskell-language-server copied to clipboard

Storing dependencies contributes significantly to memory usage

Open mpickering opened this issue 2 years ago • 5 comments

I ran a profile after loading GHC into HLS and see that many of the large sources of allocation are due to to big lists of keys.

Total: 1.8 million allocated key values for 90MB (10%) of live data

hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key:89794800:1870725:48:48.0

1.4 million, 34MB of lists containing keys

ghc-prim:GHC.Types::[hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key,ghc-prim:GHC.Types::]:34671936:1444664:24:24.0

1.2 million, 29MB of the GetModSummaryWithoutTimestamps Key

ghc-prim:GHC.Tuple:(,)[ghcide-1.7.0.1-inplace:Development.IDE.Core.RuleTypes:GetModSummaryWithoutTimestamps,lsp-types-1.4.0.1-bc66547fb74f5fe287e9850a7cf5b08fad3aa0baf55d7ffc8b91807e767ee251:Language.LSP.Types.Uri:NormalizedFilePath]:29105232:1212718:24:24.0

mpickering avatar Jun 16 '22 11:06 mpickering

These do not look like reverse dependencies to me. Reverse deps are stored in a HashSet:

https://github.com/haskell/haskell-language-server/blob/30d48ed705e929e4ff7da16b00ca4946ea850407/hls-graph/src/Development/IDE/Graph/Internal/Types.hs#L99-L102

pepeiborra avatar Jun 18 '22 06:06 pepeiborra

More generally, the space usage of build keys is something that I noticed a while ago. I tried to maximising sharing of NormalizedFilePath values by hashconsing them here:

https://github.com/haskell/lsp/pull/340

But then decided to revert that change since it has its own set of problems:

https://github.com/haskell/lsp/pull/344

And instead apply a more localised fix for the worst offender:

https://github.com/haskell/haskell-language-server/pull/1996

pepeiborra avatar Jun 18 '22 07:06 pepeiborra

Question: are the space usage stats produced by ghc-debug aware of sharing?

pepeiborra avatar Jun 18 '22 07:06 pepeiborra

Sorry, these aren't reverse dependencies but the direct dependencies stored in ResultDeps.

Question: are the space usage stats produced by ghc-debug aware of sharing?

Yes, I believe so.

I think in this case a large part of the problem is that Key values aren't shared, because toKey allocates a new tuple on every call, even though the total number of distinct keys is about ~15,000 in this example.

Also, the definition of Key as data Key = forall a . (Typeable a, Eq a, Hashable a, Show a) => Key a means that each Key contructor needs to store 5 pointers (4 to the class dictionaries). This could possibly be reduced to 1 pointer for the class dictionaries if we had

class (Typeable a, ...) => C a
instance (Typeable a, ...) => C a
data Key a = forall a. C a => Key a

wz1000 avatar Jun 18 '22 09:06 wz1000

Is this still a problem?

michaelpj avatar Jan 16 '24 18:01 michaelpj