SharpDebug icon indicating copy to clipboard operation
SharpDebug copied to clipboard

DwarfCompilationUnit.ReadData causes excessive memory usage on large binaries

Open dedmen opened this issue 4 years ago • 1 comments

All the Attributes parsed in DwarfSymbolProvider.DwarfCompilationUnit.ReadData are not deduplicated/interned. For a big binary (in my case with debug info about 900MB) this will cause extreme memory usage. Within the first 100 compilation units my memory usage rises to 12GB and then it gets stuck there because I ran out of memory.

As a ultra ugly hotfix I added this in DwarfSymbolProvider.ParseCompilationUnits

public class StringInterner
    {
        // deduplicate strings
        // meh https://github.com/dotnet/runtime/issues/21603 https://stackoverflow.com/questions/7760364/how-to-retrieve-actual-item-from-hashsett 
        ConcurrentDictionary<object, object> stringBank = new ConcurrentDictionary<object, object>();

        public object InternObject(object str)
        {
            if (str == null) return str;

            if (stringBank.TryGetValue(str, out var result))
            {
                return result;
            }

            stringBank.AddOrUpdate(str, str, (x,y)=> x);
            return str;
        }
    }

private static DwarfCompilationUnit[] ParseCompilationUnits(byte[] debugData, byte[] debugDataDescription, byte[] debugStrings, NormalizeAddressDelegate addressNormalizer)
        {
            using (DwarfMemoryReader debugDataReader = new DwarfMemoryReader(debugData))
            using (DwarfMemoryReader debugDataDescriptionReader = new DwarfMemoryReader(debugDataDescription))
            using (DwarfMemoryReader debugStringsReader = new DwarfMemoryReader(debugStrings))
            {
                List<DwarfCompilationUnit> compilationUnits = new List<DwarfCompilationUnit>();

                StringInterner interner = new StringInterner();

                List<Task> tasksList = new List<Task>();

                while (!debugDataReader.IsEnd)
                {
                    DwarfCompilationUnit compilationUnit = new DwarfCompilationUnit(debugDataReader, debugDataDescriptionReader, debugStringsReader, addressNormalizer, interner);

                    tasksList.Add(Task.Run(() =>
                    {
                        // intern all attributes in seperate threads

                        foreach (var compilationUnitSymbol in compilationUnit.Symbols)
                        {
                            compilationUnitSymbol.Attributes = 
                                compilationUnitSymbol.Attributes
                                    .Select(x => new KeyValuePair<DwarfAttribute, DwarfAttributeValue>(x.Key, interner.InternObject(x.Value) as DwarfAttributeValue))
                                    .ToDictionary(x => x.Key, x => x.Value);
                        }
                    }));




                    compilationUnits.Add(compilationUnit);
                }

                Task.WaitAll(tasksList.ToArray());

                return compilationUnits.ToArray();
            }
        }

This keeps my memory usage at the 400th compilation unit down at 7.7GB which is atleast usable. I originally did the interning in DwarfCompilationUnit.data but that took too much time, the data reading is already the performance bottleneck, better not add anything extra to it. Moving it out into a seperate thread/task works well for me so far. One could probably intern the whole attribute instead of just the attribute value, not sure if that would be better, I assume it won't.

dedmen avatar Oct 14 '21 13:10 dedmen

The next way more elaborate step would be noticing that almost all CU's have the std:: namespace, with over and over again the same symbols inside it that can all be deduplicated (the only things that are different are the offsets), but oof

dedmen avatar Oct 14 '21 14:10 dedmen