ghc-heap-view icon indicating copy to clipboard operation
ghc-heap-view copied to clipboard

Possible issue with ghc-heap-view

Open erikd opened this issue 8 years ago • 7 comments

Hi @nomeata, Ive been using ghc-datasize which sits on top of ghc-heap-view and getting some strange errors out of the GHC RTS with both ghc-7.10.3 and ghc-8.0.1. I raised a GHC ticket https://ghc.haskell.org/trac/ghc/ticket/12492 , but I'm beginning to think that ghc-heap-view is actually problem.

Any ideas?

erikd avatar Aug 18 '16 07:08 erikd

ghc-heap-view is very brittle and a hack, so this is very likely its fault, and will require some C-level debugging to detect. (Someone really needs to get a supported interface for this in to GHC.)

nomeata avatar Aug 18 '16 13:08 nomeata

I'm particularly interested in the functionality provided by ghc-datasize and I need it in ghc 7.10. Any suggestions for debugging this current problem?

Longer term I would definitely be interested in helping getting this into GHC.

erikd avatar Aug 18 '16 21:08 erikd

Beginning to think I should not use this code in production.

erikd avatar Aug 18 '16 22:08 erikd

Any suggestions for debugging this current problem?

printf-debugging in ghc-heap-view/cbits/HeapView.c is a start. Try to find out what kind of closure it is looking at when it crashes.

Beginning to think I should not use this code in production.

No! Don’t.

nomeata avatar Aug 18 '16 22:08 nomeata

No! Don’t.

OK, I'm convinced! :)

erikd avatar Aug 18 '16 22:08 erikd

@nomeata I did some printf debugging and found that the big allocation was being made in the function slurpClosurezh implemented in HeapViewPrim.cmm. The closure that is about to be inspected at that point is of type ARR_WORDS and the size it is allocating is exactly what it should be based on the return value of gtc_heap_view_closureSize.

Now what I don't understand is that if I Control.DeepSeq.force the data structure I don't hit this problem. In fact, if I add debug to only print the size of ARR_WORDS closures, most (all?) are sized less than about 8. Without the structure being force-ed, the very first ARR_WORDS closure in the program has a size of 129362 and fails.

erikd avatar Aug 21 '16 22:08 erikd

If you force the structure, there are no thunks, no pointers to code etc. Maybe this ARR_WORDS is some static reference to the whole code or something else that will turn out weird and pointless.

Do you think you can find out what closure is pointing to this big static array? We might find that we should simply skip a certain field of a certain static closure.

nomeata avatar Aug 23 '16 11:08 nomeata