wpf icon indicating copy to clipboard operation
wpf copied to clipboard

WPF control is not reclaimed on NET9 by GC

Open minaew opened this issue 1 year ago • 28 comments

Description

GC does not reclaim control memory in following scenario. The control is placed in the visual tree and then removed from it. Conditions must be met:

  • PresentationTraceSources.Refresh() call
  • there is a subscriber to the Unloaded event of control

Reproduction Steps

Run test TestLeak in project targeting net9.0-windows. NUnit version is 3.12.0.

using System;
using System.Diagnostics;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Threading;
using NUnit.Framework;

namespace Sample {
    [TestFixture]
    public class MemoryTests {
        [Test]
        public void TestLeak() {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            GC.Collect();
            GC.WaitForPendingFinalizers();
            Assert.IsFalse(wr.IsAlive);
        }

        static WeakReference AddAndRemoveImage(Window window) {
            var img = new Image();
            var wr = new WeakReference(img);
            var holder = new ImageHolder(img);
            window.Content = img;
            window.Show();
            DoEvents();
            window.Content = null;
            DoEvents();
            return wr;
        }

        static void DoEvents() {
            var frame = new DispatcherFrame();
            Dispatcher.CurrentDispatcher.InvokeAsync(() => frame.Continue = false, DispatcherPriority.Background);
            Dispatcher.PushFrame(frame);
        }

        class ImageHolder {
            readonly Image image;
            public ImageHolder(Image image) {
                this.image = image;
                this.image.Loaded += OnLoaded;
                this.image.Unloaded += OnUnloaded;
            }

            void OnLoaded(object sender, RoutedEventArgs e) {
                Debug.WriteLine("OnLoaded");
            }

            void OnUnloaded(object sender, RoutedEventArgs e) {
                Debug.WriteLine("OnUnloaded");
            }
        }
    }
}

Expected behavior

The test passes.

Actual behavior

The test fails.

Regression?

This works in net8.0 and net462.

Known Workarounds

No response

Configuration

.NET: 9.0.0-preview.6.24327.6 OS: Windows 10, Version 22H2 (OS Build 19045.4651) Architecture: x64

Other information

No response

minaew avatar Jul 12 '24 15:07 minaew

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

teo-tsirpanis avatar Jul 12 '24 15:07 teo-tsirpanis

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

EgorBo avatar Jul 12 '24 16:07 EgorBo

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

Thanks. I tried, still no success.

minaew avatar Jul 12 '24 16:07 minaew

Might be the same issue as https://github.com/dotnet/runtime/issues/104218 cc @VSadov

EgorBo avatar Jul 12 '24 16:07 EgorBo

The RetryAttribute helps. Less tries is required if test closes the window.

minaew avatar Jul 12 '24 16:07 minaew

I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.

teo-tsirpanis avatar Jul 12 '24 17:07 teo-tsirpanis

I would also suggest running GC and waiting for pending finalizers in your tests more than once. The object might not be released on the first try.

This also does not help, event increases the number of required retries.

minaew avatar Jul 16 '24 09:07 minaew

By the way, if I hide window (window.Hide()) the test becomes much more stable, but still not 100%.

minaew avatar Jul 16 '24 09:07 minaew

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

@EgorBo, can you please clarify what you mean by behavior change here?

mangod9 avatar Jul 18 '24 23:07 mangod9

Can you add [MethodImpl(MethodImplOptions.NoInlining)] in AddAndRemoveImage and try again? There's a chance the JIT has inlined AddAndRemoveImage, which might have extended the lifetime of img.

if it's inlined, then it means it's Tier1/Fullopts, hence, precise liveness is expected, so should also be claimed. I quickly tried the repro locally and it seems like there is indeed a change in behaviour between net8.0-windows and net9.0-windows for both Tier0 and Tier1, so shouldn't be JIT's fault.

@EgorBo, can you please clarify what you mean by behavior change here?

            GC.Collect();
            GC.WaitForPendingFinalizers();

do not longer call the finalizer right away in .NET 9.0 while it's consistently calling it in .NET 8.0. Bot for optimized codegen and unoptimized. I know it's not supposed to be 100% guaranteed, but it's just what the author is complaining

EgorBo avatar Jul 18 '24 23:07 EgorBo

Ok thanks. Adding @jkotas @VSadov as well, possibly related to moving finalizer loop to managed?

mangod9 avatar Jul 19 '24 18:07 mangod9

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

jkotas avatar Jul 21 '24 01:07 jkotas

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

net9-gc-issues.zip

minaew avatar Jul 22 '24 09:07 minaew

possibly related to moving finalizer loop to managed?

It is very unlikely to be related to that change.

I am not able to reproduce the issue. Could you please create a scratch project on github that can be cloned and run to reproduce it?

https://github.com/minaew/net9-gc-issues

minaew avatar Jul 22 '24 09:07 minaew

I am able to reproduce this. (Also it passes when targeting 8.0, so this is a 9.0 - specific change)

Even if I refactor the test like

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static WeakReference M1()
        {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            return wr;
        }

        [Test, RequiresThread(ApartmentState.STA)]
        public void TestLeak()
        {
            WeakReference wr = M1();

            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            GC.WaitForPendingFinalizers();

            Assert.IsFalse(wr.IsAlive);
        }

it reproduces, which indicates that it may be not https://github.com/dotnet/runtime/issues/104218 .

https://github.com/dotnet/runtime/issues/104218 is so far believed to be a race and calling Collect/WFPF more than once typically helps with that.

VSadov avatar Jul 22 '24 15:07 VSadov

The test is not always failing. Sometimes it passes, but rarely.

When it is failing, the image is rooted via a static array

image

VSadov avatar Jul 22 '24 15:07 VSadov

If I change the test in the following way, the test always passes:

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static WeakReference M1()
        {
            PresentationTraceSources.Refresh();
            var window = new Window();
            var wr = AddAndRemoveImage(window);
            return wr;
        }

        [Test, RequiresThread(ApartmentState.STA)]
        public void TestLeak()
        {
            WeakReference wr = M1();

            for (int i = 0; i < 1000000; i++)
            {
                if (!wr.IsAlive)
                {
                    break;
                }

                GC.Collect();
                GC.WaitForPendingFinalizers();
            }

            Assert.IsFalse(wr.IsAlive);
        }

Basically - given enough time the image becomes unreachable and gets collected. But it may take spinning through up to 4000 calls to GC.Collect().

This does not look like an issue with GC or with finalization, but more like something needs to happen on the app/WPF/XAML side before it drops the reference to the image. It does drop it eventually, but it takes time. Perhaps there is a cache cleaned on timer, or something of that sort....

VSadov avatar Jul 22 '24 19:07 VSadov

Another observation - may be helpful for further investigations.

When I run the test (with the "loop until pass modification"), a white window shows up and takes the focus and runs for a while. Then, if I activate any other window, the test immediately passes. That is as if becoming no longer foreground window somehow results in quick cleanup of the image object.

VSadov avatar Jul 22 '24 19:07 VSadov

Another observation:

When I run this with PerfView recording finalization events I see some varying activity initially with all kind of objects finalizing, but then there is a long list of System.Gen2GcCallback finalizing a few instances per each GC (that is the array pools watching for GCs as cache trimming strategy), but no other types finalize up to the eventual passing of the test.

Dropping the reference to Image does not seem to be related to finalizing anything in particular.

VSadov avatar Jul 22 '24 20:07 VSadov

I do not think this is related to finalization thread changes. Possibly not related to the runtime at all.

I think someone more familiar with XAML/WPF side of things should take a look.

VSadov avatar Jul 22 '24 20:07 VSadov

should it be moved to dotnet/wpf? @lindexi @h3xds1nz does this net9 regression look familiar?

kasperk81 avatar Jul 23 '24 11:07 kasperk81

@kasperk81 Emmmm, I am not sure...

lindexi avatar Jul 23 '24 12:07 lindexi

As InvokeHandlersImpl method in WPF, it will pass the Image object to _traceArguments in EventRoute when the RouteEvent be invoked, see https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/PresentationCore/System/Windows/EventRoute.cs#L184-L191

That is why the PresentationTraceSources.Refresh is important. The PresentationTraceSources.Refresh will open the TraceRoutedEvent.IsEnabled.

And it do not cause the memory leak, and it will be free after trace. I do not yet know why the behavior difference occurred, I am continuing to investigate...

lindexi avatar Jul 25 '24 09:07 lindexi

The issues cause by https://github.com/dotnet/wpf/pull/6700

To reduce allocations when tracing routed events, we'll use a field to store it which cause this issues.

The EventRoute will be re-use, and it means that the _traceArguments will has the long life time.

Cc @bgrainger

~~Emmmmm, this issues is not easy to fix. And we can not clean the _traceArguments in Clear method,~~ ~~because the TraceRoutedEvent can not consumption timely and we ca not know when the tracer consume the arguments~~ Not Sure.

~~Sorry @bgrainger . Upon further investigation, I've found that your code is not-safe. As previously mentioned, we cannot definitively determine when the tracer will finish consuming the _traceArguments. This implies that during the reuse of the RoutedEvent, parameter pollution may occur, potentially leading to the tracer recording incorrect parameters. For instance, if I trigger the first RoutedEvent and record the parameters in _traceArguments and notify the tracer to consume them. Then, I reuse the same RoutedEvent object and update new parameters into _traceArguments. However, if the tracer hasn't finished consuming the first set of parameters, it means that the tracer will be using the second set of parameters when processing the first set, which is not as expected.~~

Update:

I think the _traceArguments can be clear in Clear method or TraceRoutedEvent.Trace finish. And the TraceSource will use the arrayList not the _traceArguments object. See https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/WindowsBase/MS/Internal/AvTrace.cs#L335-L339

lindexi avatar Jul 25 '24 09:07 lindexi

collection expression won't work?

                        TraceRoutedEvent.Trace(
                            TraceEventType.Start,
                            TraceRoutedEvent.InvokeHandlers,
-                          _traceArguments);
+                          [_routeItemList[i].Target, args, BooleanBoxes.Box(args.Handled)]);

kasperk81 avatar Jul 25 '24 10:07 kasperk81

Funny, I was looking at that PR/code last week and was thinking exactly that.

Good job @lindexi on the research.

h3xds1nz avatar Jul 25 '24 10:07 h3xds1nz

collection expression won't work?

I think this will still allocate. Possible solutions:

  1. Change object[] to a ReadOnlySpan. Then the collection expression won't allocate.

    In the end, the arguments are used in:

    • https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/WindowsBase/MS/Internal/AvTrace.cs#L261
    • https://github.com/dotnet/wpf/blob/df3f4bf5568830adc3ed2d3da244ee7a17d551df/src/Microsoft.DotNet.Wpf/src/PresentationFramework/MS/Internal/TraceData.cs#L96

    which only creates a string. All classes, methods and event delegates are internal, so I think this change can be made?

  2. Clear the array after it's being used. This is possible since the parameters are only used in the call stack and not being stored, however if down the line the code changes and the parameter is being stored and then being cleared, then it'll have invalid values. (This is already the case since the array is being re-used from the field).

GerardSmit avatar Jul 25 '24 11:07 GerardSmit

can this issue be moved to wpf repo ?

mangod9 avatar Jul 25 '24 15:07 mangod9

Hello there. Any updates on this?

alexdi220 avatar Mar 12 '25 09:03 alexdi220

Hello there. Any updates on this?

Cc @rchauhan18 and @dipeshmsft , could you review my PR(https://github.com/dotnet/wpf/pull/9463 )?

lindexi avatar Mar 12 '25 11:03 lindexi