mpl icon indicating copy to clipboard operation
mpl copied to clipboard

Hierarchical heaps code terminology update

Open shwestrick opened this issue 7 years ago • 4 comments

The terminology in the hierarchical heaps code needs some updates. Here are some terms that I think should change, and what I would change them to. I will add more as I encounter them.


hierarchical heap ⟹ superheap I think we better understand now that what we once called a "hierarchical heap" is actually a piece of the hierarchy with a very specific structure which deserves its own abstraction. Also, I find the term "hierarchical heap" confusing because it doesn't seem to line up with our terminology in papers and discussions:

  • We typically think of each node of the hierarchy as being a heap. In this interpretation, the hierarchy defines a parent/child relationship between heaps.
  • However, the term "hierarchical heap" suggests an interpretation where each heap of the hierarchy is a subtree of nodes, with the hierarchy providing a containment relationship between heaps. This is not the interpretation we usually use.

UPDATE: These objects could be called "pseudo-heaps"; see discussion.


level ⟹ level/depth Confusingly, sometimes "level" refers to one layer of a superheap; other times, it refers to a depth in the hierarchy. These should be distinguished. I think "level" is a good term for a layer of the superheap, but we should call "depth" for what it is.

highest ⟹ shallowest lowest ⟹ deepest Terms such as "high" and "low" are confusing for depths, since a low depth is high in the tree, and a high depth is low in the tree. "Shallow" and "deep" are unambiguous.


~~locallyCollectibleSize (LCS) ⟹ ?~~ ~~This currently refers to the size (in bytes) of the heaps which are local to a particular worker. I think this name means "the size of heaps which could be collected locally right now" but I find the name confusing. What is a better name?~~

~~locallyCollectibleHeapSize (LCHS) ⟹ ?~~ ~~This currently refers to a value relative to the locallyCollectibleSize (LCS) which is used to trigger local collections. Confusingly, it is not a size of a heap. Collections are currently triggered when the ratio LCHS/LCS goes below a set ratio (set by default to 2.0). I would prefer for this quantity to actually be a threshold in bytes. For example, call it the "localCollectionThreshold (LCT)", and then the condition for triggering a local collection would simply be LCS ≥ LCT.~~

UPDATE: These terms no longer exist in the implementation.


min chunk size ⟹ block size We currently have a command-line argument for the minimum chunk size, which is actually the block size used throughout the runtime. Blocks seem to be fundamental enough to how the runtime works to simply expose a "block size" command-line argument.

shwestrick avatar Dec 13 '18 17:12 shwestrick

This is somewhat in progress in my fork.

shwestrick avatar Dec 06 '19 17:12 shwestrick

Following the merge of unstable-cd changes into master with 810aaf99b3e857b3fcb8d5b111a44c64131f1115, the situation has changed a little bit:

  1. The name for heaps probably still should be changed, but "superheap" no longer seems to be the right term.
  2. The level/depth confusion has been mostly cleared up
  3. LCS and LCHS no longer exist. Replaced with new GC policy based upon bytesAllocatedSinceLastCollection and bytesSurvivedLastCollection in GC_thread
  4. min-chunk-size ⟹ block-size still needs to be done.

shwestrick avatar Jan 03 '20 23:01 shwestrick

(0ec7ccee32e3da139e6934440591f381b03a2b09) min-chunk-size ⟹ block-size is done.

shwestrick avatar Jan 26 '20 02:01 shwestrick

@typerSniper and I have settled on the term pseudo-heap for the struct HM_HierarchicalHeap objects.

I like this name, because pseudo-heaps are not heaps themselves, but rather artifacts of the heap implementation strategy:

  • A single "heap" (from the perspective of the theoretical heap hierarchy) can actually be made up of multiple pseudo-heaps, distributed across processors.
  • A pseudo-heap might not contain objects itself, but instead might defer to some other "representative" pseudo-heap. This indirection makes it possible to do a heap merge quickly, in only constant time.

shwestrick avatar Jul 16 '20 16:07 shwestrick