Avalonia icon indicating copy to clipboard operation
Avalonia copied to clipboard

Memory leak in master branch (automation related?)

Open BAndysc opened this issue 3 years ago • 16 comments

Describe the bug The latest master branch (tested 11.0.4 and 0.10.999-cibuild0020870-beta https://github.com/AvaloniaUI/Avalonia/commit/cea6bc27a06f25b4f2d3a73d3411e904b1767e74) has a memory leak, probably related to Automation code.

Note: this is a regression from the current stable 0.10.x version (which doesn't have automation related modules).

To Reproduce The repro is very simple - just add a button, press it and the button will not be freed anymore.

  1. Clone the repo: https://github.com/BAndysc/AvaloniaMenuLeak
  2. Checkout branch leak2
  3. Run the app
  4. The top text shows managed used memory, in the beginning it is less than 3 MBs
  5. Press the button Add control
  6. This will spawn a control, that has a reference to 1GB byte array, so obviously the ram usage will rise up to 1 GB
  7. Press Remove Control
  8. GC will free the control and datacontext almost instantly, and it will go down to 3 MBs, so far so good.
  9. Press Add control again
  10. This time click the button 'Click me'
  11. Press Remove control
  12. Notice this time the memory usage doesn't drop. The control is not freed!

Screenshots dotMemory screenshot: image

Desktop:

  • OS: Mac
  • Version: 0.10.999-cibuild0020870-beta (it works fine on 0.10.14)

BAndysc avatar Jun 01 '22 14:06 BAndysc

Possibly related to #7751

robloo avatar Jun 01 '22 21:06 robloo

Pretty weird, I used dotMemory as well to fix previous memory leak, and I used latest master (well, couple of days old now). And there was only one memory leak with old sample code.

@grokys I will take a look tomorrow, but if you know something about automation what might be a reason here, it can help.

@robloo while I am on it, I can fix that one too. Thanks for reminding.

maxkatz6 avatar Jun 01 '22 21:06 maxkatz6

Pretty weird, I used dotMemory as well to fix previous memory leak, and I used latest master (well, couple of days old now). And there was only one memory leak with old sample code.

if you mean my sample code, then the previous one (menu related) didn't have this button, I've discovered it accidentally, while trying to reproduce another leak

BAndysc avatar Jun 01 '22 21:06 BAndysc

@maxkatz6 If you want to close #7751 great! It's otherwise still on my list to complete before 11.0 but it's one of the lowest priority.

robloo avatar Jun 02 '22 00:06 robloo

Yeah, reproduced using Accessibility Insights to trigger automation. Got three weak references that live after button was detached: image

maxkatz6 avatar Jun 03 '22 02:06 maxkatz6

I think the reason for this memory leak is that the method responsible for removing one, two or three weak references is simply not called: I'm talking about Avalonia.Win32.Automation.Automation.AutomationNode.Release method: image

I was told by @maxkatz6 that @grokys may know where this method can/should be called. Pinged the second one, hopefully he will reply and help with fix.

flexxxxer avatar Aug 10 '23 21:08 flexxxxer

Hey @flexxxxer I just talked to @grokys and asked him if he has any idea. Unfortunately, this seems to be hard to debug as e.g. "dotmemory can't really help with COM leaks". So it may take quite a while for us to find the root cause and track this down. Thank you for your hands off on this :-)

timunie avatar Sep 01 '23 12:09 timunie

Thanks @timunie . To address the usage of Release - I believe this should be called by the win32 automation APIs but I may be mistaken, need to find a little time to actually look into it (I may have this completely wrong rn, can't really remember offhand how it all works).

grokys avatar Sep 01 '23 12:09 grokys

I have idea how to fix that (where call Release), will try implement it in few days

flexxxxer avatar Oct 06 '23 20:10 flexxxxer

This problem still happens with latest master (11.2.999-cibuild0047389-alpha), but only on macOS

BAndysc avatar Apr 18 '24 01:04 BAndysc

I am trying to understand the code, it would be great if anybody could go through this and help me understand it.

Native AvnAccessibilityElement holds a pointer to IAvnAutomationPeer.

https://github.com/AvaloniaUI/Avalonia/blob/d5ca06ff2432dcc4671a4ed795a5ce6a5cdadddc/native/Avalonia.Native/src/OSX/automation.mm#L56-L60

It is saved into a field here:

https://github.com/AvaloniaUI/Avalonia/blob/d5ca06ff2432dcc4671a4ed795a5ce6a5cdadddc/native/Avalonia.Native/src/OSX/automation.mm#L93-L100

And initWithPeer is called from acquire:

https://github.com/AvaloniaUI/Avalonia/blob/d5ca06ff2432dcc4671a4ed795a5ce6a5cdadddc/native/Avalonia.Native/src/OSX/automation.mm#L63-L91

acquire is called from few places. I.e. from accessibilityHitTest, afaik called by the OS when the user press a pointer

https://github.com/AvaloniaUI/Avalonia/blob/d5ca06ff2432dcc4671a4ed795a5ce6a5cdadddc/native/Avalonia.Native/src/OSX/automation.mm#L476-L482

hit comes from the RootProvider_GetPeerFromPoint call, which goes to c# world via microcom generated code (code reformated for visibility, i.e. removed try{}catch{}):

static void* RootProvider_GetPeerFromPoint(void* @this, AvnPoint point)
{
    IAvnAutomationPeer __target = (IAvnAutomationPeer)global::MicroCom.Runtime.MicroComRuntime.GetObjectFromCcw(new IntPtr(@this));
    var __result = __target.RootProvider_GetPeerFromPoint(point);
    return global::MicroCom.Runtime.MicroComRuntime.GetNativePointer(__result, true);
}

Lastly, let's see MicroComRuntime.GetNativePointer (also removed parts that are not important)

public static void* GetNativePointer<T>(T obj, bool owned = false) where T : IUnknown
{
    if (obj is IMicroComShadowContainer container)
    {
        container.Shadow ??= new MicroComShadow(container);
        void* ptr = null;
        var res = container.Shadow.GetOrCreateNativePointer(typeof(T), &ptr);
        if (owned)
            container.Shadow.AddRef((Ccw*)ptr);
        return ptr;
    }
    throw new ArgumentException("Unable to get a native pointer for " + obj);
}

So it creates MicroComShadow if needed, then increment a reference counter (AddRef). Wait a minute. So if this method increases the reference counter, somewhere it should be decremented. But where? I don't see in the code when those objects can be deallocated. MicroComShadow is the object that keeps the reference to automation classes, which eventually holds references to view classes. Please correct me if am wrong.

BAndysc avatar Apr 18 '24 03:04 BAndysc

https://github.com/AvaloniaUI/Avalonia/blob/master/src/Avalonia.Native/AvnAutomationPeer.cs is the managed part. So if that is cleaned up it should also clean up the native part.

Gillibald avatar Apr 18 '24 03:04 Gillibald

@Gillibald

As far as I understand, the problem is that the Native wrapper - MicroComShadow keeps a strong reference to AvnAutomationPeer, while MicroComShadow itself if rooted, thus never collected, neither AvnAutomationPeer nor Control nor DataContext in result, leading to massive leaks.

image

https://github.com/kekekeks/MicroCom/blob/2f698e682efba38feeaf4e84ca9a39052fe6cab3/src/MicroCom.Runtime/MicroComShadow.cs#L15-L20

internal IMicroComShadowContainer Target { get; }
internal MicroComShadow(IMicroComShadowContainer target)
{
    Target = target;
    Target.Shadow = this;
}

I am not exactly why MicroComShadow gets rooted (it makes sense since it has to be accessible from the unmanaged code, but I am not sure when it exactly it happens).

BAndysc avatar Apr 18 '24 11:04 BAndysc

I always was suspicious of these IAvnString usages: https://github.com/AvaloniaUI/Avalonia/blob/master/src/Avalonia.Native/AvnAutomationPeer.cs#L32-L40

You can clearly see that AvnString is created on C# side, but never disposed.

For file picker code I had a hack like this, ensuring that all of these strings will be disposed: https://github.com/AvaloniaUI/Avalonia/blob/27e40b820e667c4ab1c7edacf4d3c65dfe3c7f4a/src/Avalonia.Native/SystemDialogs.cs#L131-L148 Which are all disposed together after file picked: https://github.com/AvaloniaUI/Avalonia/blob/27e40b820e667c4ab1c7edacf4d3c65dfe3c7f4a/src/Avalonia.Native/SystemDialogs.cs#L56

As well as on ObjC side all strings (ideally) should be released as soon as possible. For example, right when they are converted to NSStrings: https://github.com/AvaloniaUI/Avalonia/blob/27e40b820e667c4ab1c7edacf4d3c65dfe3c7f4a/native/Avalonia.Native/src/OSX/SystemDialogs.mm#L418

maxkatz6 avatar Apr 18 '24 23:04 maxkatz6

But yes, whole AutomationPeer leaking is a bigger object/problem than some strings.

maxkatz6 avatar Apr 18 '24 23:04 maxkatz6

This issue is also on windows. We are using this with UIAutomation and we have that here. There is any news about it ?

Tenjim avatar Aug 26 '24 09:08 Tenjim