mat icon indicating copy to clipboard operation
mat copied to clipboard

Diff Heap Dumps

Open eclipsewebmaster opened this issue 9 months ago • 14 comments

| --- | --- | | Bugzilla Link | 283778 | | Status | ASSIGNED | | Importance | P3 enhancement | | Reported | Jul 16, 2009 20:25 EDT | | Modified | Oct 01, 2023 13:02 EDT | | Version | 0.8 | | Depends on | 298078 | | See also | Gerrit change https://git.eclipse.org/r/158932, Git commit 5f7e344d, Gerrit change 170710, Git commit 6ed8a869, Gerrit change 170909, Git commit 36277239, Gerrit change 171099, Git commit 7fd03e48 | | Reporter | Nathan Reynolds |

Description

Diff'ing heap dumps is an extremely valuable ability. However, not many know how to diff two heap dumps when there is no common way to link two different objects together. Here is an algorithm which works fairly well. Please implement it soon. This is a much needed feature.

  1. In each heap dump, create a "Merge Shortest Paths to GC Roots" excluding weak/soft references for all objects.
  2. For each root in heap dump 1, pair it with the matching root in heap dump 2.
  3. For each pair of roots, pair the children using the parent's variable names.
  4. For each pair of children, pair the grand-children using the children's variable names.
  5. Repeat for each pair in the heap dumps.
  6. If an object doesn't have a matching object in the other heap, pair it to nothing and don't process its children.
  7. When finished pairing up objects, list all of the roots and add columns which show the difference in memory and object count between the two heaps. For those objects that are paired to nothing, use 0 for the memory usage of nothing.
  8. The default should be to descend sort on the memory usage difference.

The user can then start expanding nodes into the tree to see where an object in one heap dump is holding onto more objects in another heap dump. This is where the leak is occurring. The user then has to figure out why the leaked objects are being left in their "parent" object... which is beyond the scope of a memory analyzer.

In the first phase, get the above working. In the second phase, we have to deal with collections. Collections are difficult because every child variable is the same.

When the algorithm hits a collection, it first has to use the type (class name) of the children objects for starters. For some uses of collections, this splits the children into several sets. For most uses of collections, all the children are of the exact same type so this doesn't help.

The next thing it should do depends upon the collection.

Lists - use the index of the element as a guide.
Maps - use the key to match like elements.
Sets - no ideas on this one. Good luck.

Some objects overload equals() and hashCode(). It would be helpful if the user could supply the jar file where the class is located. MAT could then load the class, create instances of the objects and force the member variables to be the values taken from the heap dump. The equals() and hashCode() operators will hopefully work and the objects can be paired.

A good heuristic would be to examine the member variables that are fundamental data types (e.g. char, boolean, byte, short, int, long) and Strings. The objects that are most similar should be paired.

In the GUI, allow the user to change the pairings as they see fit.

eclipsewebmaster avatar May 08 '24 15:05 eclipsewebmaster