runtime Skip frozen segments for commit accounting

Frozen segments should be excluded from the commit accounting verification.

Sep 16 '22 03:09 cshung

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details

Frozen segments should be excluded from the commit accounting verification.

Author:	cshung
Assignees:	cshung
Labels:	`area-GC-coreclr`
Milestone:	-

Sep 16 '22 03:09 ghost

I assume it makes GCHeap::UpdateFrozenSegment that I added in https://github.com/dotnet/runtime/pull/49576 useless, right? (it reports to gc when vm commits more memory for a frozen segment)

Sep 16 '22 15:09 EgorBo

I assume it makes GCHeap::UpdateFrozenSegment that I added in #49576 useless, right? (it reports to gc when vm commits more memory for a frozen segment)

It depends on what was your goal when you introduced UpdateFrozenSegment.

Let's get started with the concept of commit accounting.

In the case of heap_hard_limit (i.e. COMPlus_GCHeapHardLimit), we wanted to restrict the GC heap from committing too much memory. To do so, we keep track of how much memory is used by the GC heap by having some accounting variables (e.g. current_total_committed, committed_by_oh, ...). These variables represent how much memory is committed for a certain purpose.

To keep these variables up-to-date, we add and subtract them whenever we commit/decommit within the GC. Sometimes, the GC repurposes memory (e.g. moving a region from a heap to a free list), and we adjust our accounting in those cases as well.

One of the concerns is that when the GC code base evolves, the accounting might be wrong, and that's why we have various methods (e.g. the two places we are editing in the code) to check if the accounting was done correctly using asserts. The idea is that if I loop through the heap data structures, the total committed bytes in those regions should match exactly with the accounting variables.

Now, let's get back to this change.

Since we didn't update the accounting variables on UpdateFrozenSegment, we need to adjust the check such that it matches.

Assuming your goal is to update the heap_segment_committed and heap_segment_allocated such that it reflects correctly on SOS, this change doesn't change anything for that.

On the other hand, if your goal is to make sure committed memory used for the frozen segments is counted towards heap_hard_limit, then we will probably want an alternative change that updates the accounting variables on UpdateFrozenSegment/RegisterFrozenSegment/UnregisterFrozenSegment instead.

In any case, this bug can be easily reproed by running under a checked build with DOTNET_GCHeapHardLimit turned on.

Sep 16 '22 16:09 cshung

Frozen segments allocated at runtime, like that ones added by Egor for string literals, should count towards GC hard limit. They are no different from other memory managed by the GC from the user point of view. The fact that we populate this memory using different mechanism is an internal implementation detail.

We may want to have a flag on frozen segments that says whether or not it counts towards GC hard limit. Frozen segments created that are backed by read-only file-mappings should not count towards GC hard limit.

Sep 17 '22 16:09 jkotas

If we count the committed bytes towards heap_hard_limit, there is a possibility that we ran out of limit on these calls. @EgorBo, is it okay to throw OutOfMemory at the spot you call RegisterFrozenSegment or UpdateFrozenSegment?

Sep 17 '22 18:09 cshung

is it okay to throw OutOfMemory at the spot you call RegisterFrozenSegment or UpdateFrozenSegment?

I think so, we already throw OOM in some cases from those call sites

Sep 17 '22 18:09 EgorBo

UpdateFrozenSegment would need some work to implement the backout correctly. Its call site assumes that it will never fail. Also, the extra memory was committed already when UpdateFrozenSegment is called, so the "damage" was done already.

Would it be easier for GC to ignore the fact that the frozen segment overshot the limit?

Sep 17 '22 21:09 jkotas

@Maoni0

Sep 19 '22 18:09 cshung

Would it be easier for GC to ignore the fact that the frozen segment overshot the limit?

it would be much more desirable for GC to simply ignore the frozen segments. from GC's POV, these segments are something we only need to care about when we have to, ie, when they are in range, meaning if they are between gc's g_lowest_address and g_highest_address, in which case GC will need to have the bookkeeping datastructure for them. but other than that, GC just ignores them. GC isn't in charge of allocating them or managing objects on them. they are almost like native memory so it'd be better for components that allocate them to make sure there's memory for them.

UpdateFrozenSegment doesn't just update heap_segment_committed, it also updates heap_segment_allocated and GC does need the allocated value to do the minimal bookkeeping (ie, clear the mark bits in sweep_ro_segments).

Sep 20 '22 04:09 Maoni0

From outside the runtime point of view, the string literals are managed heap memory just like any other. I expect that the diagnostic tools will be treating them as such. If the GC APIs exclude them, we may be questions about discrepancies between GC heap sizes reported by GC APIs vs. what's reported by diagnostic tools as sum of all managed object sizes.

Sep 20 '22 05:09 jkotas

From outside the runtime point of view, the string literals are managed heap memory just like any other. I expect that the diagnostic tools will be treating them as such

actually diagnostic tools are already distinguishing between different types of heaps (eg SOH vs LOH) so I'd imagine this would be another type of heap and tools can display objects in this heap like they do with SOH/LOH/POH (we'll of course need to make sure we expose all the necessary info for tools to read ro segments info).

if someone wants to set a hardlimit for their app, they usually look at the heap in a memory tool. it seems like we could just tell them that ro segments are not charged against hardlimit in tooling. the only customer we currently have for ro segments allocates this memory themselves and knows clearly it's not charged against hardlimit.

in any case, my understanding is there'll need to be some diag work that needs to happen for ro segs now it's used by the runtime. I'm interested to see what the diag folks have to say what they think the experience should be.

Sep 20 '22 07:09 Maoni0

I commented over in https://github.com/dotnet/runtime/pull/49576#issuecomment-1252034120, but I think this issue is raising similar concerns. We need to decide how we will conceptualize this heap for users and then we want to make all of our accounting and quotas match that story. GCHeapHardLimit is one of ~10 different ways users might interact with this accounting.

I think there are three different stories we could tell:

a. Treat the Frozen Object Heap as a subdivision of some other part of the GC heap, for example considering it a subset of SOH gen2, or a subset of the pinned object heap. Two variations here: (i) We could disclose that this subdivision exists in our public docs/APIs/tools or (ii) we could treat it as an undisclosed implementation detail. b. Disclose the Frozen Object Heap as a new top level category within the GC. We would now have 4 GC regions - SOH, POH, LOH, and Frozen Object Heap (FOH). c. Disclose the Frozen Object Heap as a new memory region that holds managed objects, but is distinct from the GC entirely.

I think (a) and (b) require less effort from users to understand because it preserves the existing invariant that "GC heap" == "the pool of memory in which all managed objects are allocated". For people who only want/need a simple understanding of .NET memory this is a nice abstraction.

In terms of labor anything that exposes the Frozen Object Heap as a new developer visible concept that they observe and reason about through tools/APIs/docs will add a bunch of touch points (example https://github.com/dotnet/diagnostics/issues/1408).

If the Frozen Object Heap will typically be small and not very impactful for performance investigation I lean towards option a-ii to minimize the total work. If the Frozen Object Heap is large enough that it needs explicit treatment then I lean towards option b.

We may want to have a flag on frozen segments that says whether or not it counts towards GC hard limit. Frozen segments created that are backed by read-only file-mappings should not count towards GC hard limit.

How would we conceptualize this for users? This one seems like we've conceptually split the FOH into FileMapped FOH and non-FileMapped FOH, treated the FileMapped FOH as option (c) and treated the non-FileMapped FOH as (a) or (b). We would be adding a decent amount of user-visible complexity in service of (so far) one perf optimization.

Sep 20 '22 09:09 noahfalk

While the discussion is still in flux, can we merge this change for now?

Without this change, any app running under heap_hard_limit using the frozen object heap is failing with FATAL_GC_ERROR right now whenever it runs a GC.

Sep 20 '22 17:09 cshung

While the discussion is still in flux, can we merge this change for now?

Sounds good to me.

Sep 20 '22 17:09 jkotas

The CI is ready, once I have a code review approval I can get this merged.

Sep 20 '22 18:09 cshung