perfview Replace bool[] sampleSet with HashSet<int>

Found while analyzing a .gcDump, there are 858 bool[] arrays in LOH, largest are 4 mb each, taking total 950 mb of space, almost empty, mostly not rooted. Replace with HashSet. Managed heap much smaller afterwards.

Oct 31 '19 21:10 ninedan

I captured an ETW trace afterwards, HashSet.Contains cost is 0.2% CPU, quite small. There are lots of other places which uses much more CPU. For example Depth() is using 1.9% CPU. We just can't afford to grow hundreds of bool[] arrays into LOH. I'm analyzing a .gcDump file with 3 million nodes.

Oct 31 '19 22:10 ninedan

@ninedan The issue arose when I was analyzing gcdump files with ~70-90 million nodes and extremely deep recursion (e.g. linked lists with 1 million+ nodes).

Oct 31 '19 22:10 sharwell

@sharwell, are you worried about HashSet.Add time or HashSet.Contains time? When you're analyzing gcDump file with 70-90 million nodes, how many nodes are they sampled down to? How much would HashSet version cost, relative to total CPU usage?

Oct 31 '19 23:10 ninedan

how many nodes are they sampled down to

I never do sampling. There were 70-90 million nodes in the visible stacks.

How much would HashSet version cost, relative to total CPU usage?

I don't remember how much difference this particular change made. The full set of changes resulted in a reduction from several hours of processing down to a few seconds.

Oct 31 '19 23:10 sharwell

@sharwell , check new change. I'm adding a new class IndexSet which automatically switches to use bitset when there are enough element to make it more memory efficient.

Nov 04 '19 16:11 ninedan

@ninedan I'm not opposed to a change to reduce memory usage, but keep in mind that execution time is the overwhelming dominant concern. If someone was to submit a pull request that improved execution time at the expense of another 40GiB of runtime memory overhead, I would happily support it (and merge it into my interner branch even if it doesn't make its way to master).

For this change, I'd like to see BenchmarkDotNet results showing that the new IndexSet class is no slower at scale at runtime than the current byte[] approach. If the performance is the same or better, then I would not be opposed to the change.

Nov 04 '19 20:11 sharwell

@sharwell, can you provide a test case which shows CPU time with this integer set is over great importance?

I think CPU time is never the real issue. The real issue is always memory. With lots of LOH allocations, GC can be extremely expensive, causing lots of CPU usage. So the key is actually reduce memory usage, especially LOH allocations. The current solution has at least two issues: 1) resizing in a loop, 2) using bool[] array when bit set is enough.

The third issue is when the range of the indexes are large enough, there could be thousands of big LOH allocations, as I saw in my analysis with 3 million object .gcDump file.

The latest version address all these 3 issues.

Nov 04 '19 22:11 ninedan

@ninedan Unfortunately this change predated when I was keeping track of specific test scenarios. I'll try to keep a watch for them in the future.

I think CPU time is never the real issue. The real issue is always memory.

End-to-end execution time on the wall clock is the only consideration I use for making performance changes to address scaling problems in PerfView. Sometimes it's CPU usage for algorithms, and other times it's CPU usage from GC. Whatever it takes to get the application to finish faster without compromising the quality of results is the approach to go with.

Nov 04 '19 23:11 sharwell

I do think that both execution time and memory are concerns here. At this point, it's not clear to me whether or not we do want to take this change. I think a prerequisite to taking this particular change is going to be to figure out what types of test cases we can run against it to see how it behaves under load. I've filed #1041 so that we can figure out what kinds of tests we should run on this.

@ninedan, thank you for the contribution. Let's work through the testing aspect of this and then we can evaluate the change.

Nov 05 '19 18:11 brianrob