ironpython3
ironpython3 copied to clipboard
Clean up globals defined in Python source, make sure their __del__ is called
Fix #985
The GC.Collect
on Shutdown
seems OK, however the cleanup loop seems really icky. What happens if you have a __del__
method that relies on a global and it gets deleted first? Then the __del__
method would throw. A (maybe?) reasonable example where this would occur would be:
_Foo_count = 0
class Foo:
def __init__(self, name):
global _Foo_count
_Foo_count += 1
self.name = name
print("Creating instance of Foo (Total: {})".format(_Foo_count))
def __del__(self):
global _Foo_count
_Foo_count -= 1
print("Deleting i instance of Foo (Total: {})".format(_Foo_count))
It may be unfortunate, but in the particular case of resource cleanup, I think the burden should fall on the Python developer. For example, if someone opens up a file and wants to make sure it gets flushed to disk, then they should use a with
statement or call close
themselves (I guess that would apply to all io
objects). Similarly, if they really want the __del__
to get invoked, then they should at a minimum delete their variable.
This would mean that in https://github.com/IronLanguages/ironpython3/issues/985 the user would be expected to del foo
. Similarly for the io
examples in https://github.com/IronLanguages/ironpython3/issues/938. However, the sys.stdout.write
example in https://github.com/IronLanguages/ironpython3/issues/938 is a perfectly reasonable expectation and we should definitely support that one.
It looks like .NET Framework does a GC collect on app shutdown, but .NET Core does not so unfortunately this means these sorts of problems may be more frequent on .NET Core. I'm not sure if it's possible to mimic the .NET Framework behavior with .NET Core?
I would be happy to be convinced otherwise... @BCSharp any thoughts on the subject?
Yes this is really an issue and a better solution is needed.
BTW, It's probably not just missing a GC at the process exit for .NET Core. I tried renaming the Main
to CmdExec
in PythonConsoleHost
and add
public static int Main(string[] args) {
int code = CmdExec(args);
GC.Collect();
GC.WaitForPendingFinalizers();
return code;
}
so that local variable shouldn't be the reason for the object not being collected. But the __del__
of the global var is still not called.
It seems there's a more complicated difference between .NET Framework and .NET Core.
From https://docs.microsoft.com/en-us/dotnet/api/system.object.finalize#how-finalization-works.
The garbage collector then calls the Finalize method automatically under the following conditions: On .NET Framework only, during shutdown of an application domain, unless the object is exempt from finalization. During shutdown, even objects that are still accessible are finalized.
I wrote about finalizers before, so here is just the bottom line: every time I see that finalizers are used as a mechanism for anything else than preventing unmanaged resource leak, I get a sense that something is icky.
In a way I am not surprised that .NET Core turns out not to run all existing finalizers on shutdown. Since their job is to release unmanaged resources, on process termination the OS will do all that cleanup anyway. And since there is no AppDomain
support, no wonder the finalizers are being skipped.
IronPython uses the .NET garbage collection mechanism, which is different than CPython's, so there are bound to be differences. Since the specific behaviour of GC is not part of the language, proper Python code should not be written in a way that it depends on it. In practice, however, it does. To quote @slozier:
While it's bad code the fact that it works with CPython means it's more than likely used in the wild...
So another thing that works with CPython is that __del__
methods, if present, will always be called on every instantiated object, even during shutdown. That is a strong case to try to support it here as well. Interestingly, however, CPython does not guarantee that on shutdown, the __del__
method on one object, when accessing another object, will be run before the accessed object is __del__
-ed, although it does its best to follow this rule. Consider this:
class A:
def __init__(self, name, ref):
self.name = name
self.ref = ref
self.del_called = False
def __del__(self):
print("Deleting {}, is del already called? {}; referring to {}, is that object already deleted? {}".format(
self.name, self.del_called, self.ref.name, self.ref.del_called))
self.del_called = True
a1 = A("a1", None)
a2 = A("a2", a1)
a1.ref = a2
Deleting a1, is del already called? False; referring to a2, is that object already deleted? False
Deleting a2, is del already called? False; referring to a1, is that object already deleted? True
That gave me an idea that builds on the cleanup solution of @jameslan: what if instead actually deleting variables from the module namespace, on shutdown we simply try-invoke __del__
on each of them? This may not be as good as CPython in terms of ordering the calls, but going depth-first may be a reasonable approximation. And of course we would need precautions that __del__
is invoked only once per instance.
@BCSharp The example you designed is a circular reference and there's no ideal order of calling. But sometimes it needs the correct order:
log = open('log.txt', 'w')
class Foo:
def __del__(self):
log.write('__del__\n')
foo = Foo()
After calling __del__
of a FileIO
object, the file is closed and can't write anymore. So log
should be del after foo
.
I'm wondering whether or how .NET Framework figuring out the order, and how CPython is handling this.
Here's a thought I had this morning. Instead of trying to delete the world, would it not be enough for practical cases to release references to the __main__
module (e.g. remove it from sys.modules
) and letting the GC do it's thing?
example you designed is a circular reference and there's no ideal order of calling
Yes, I was curious what CPython would do in such "impossible" scenario. It had basically two choices:
- Skip the second
__del__
call to prevent a deleted object from being accessed. - Allow access to the deleted object to ensure that all
__del__
methods are being called.
If CPython chose 1, I would agree with @slozier that we just let it be as is in IronPython, and put the burden on the developer to explicitly del
anything that they want __del__
to be called on. But with choive 2 I argue that it is more in line with CPython to try to make sure than all __del__
methots are being called, even if some may result in an exception (which we will have to catch unconditionally to ensure that subsequent __del__
methods are being called).
I agree it is not perfect so this is just a heuristic to make the most common scenarios to work. To handle the example for a file object, we could add a rule that objects of types defined lower in module dependency are deleted later than objects of types defined higher up.
To make it work perfectly (excluding circular references), we would need to know and trace all existing references and partial-order them. I believe this is what CPython does. But we don't have such information, although obviously the .NET GC does.
would it not be enough for practical cases to release references to the
__main__
module (...) and letting the GC do it's thing?
I think it is worth investigating. It may be better, or it may be worse. The GC has all the information of referential dependencies, so it is possible for it to do partial ordering and finalize objects in a proper sequence like CPython does. But I have never read any explicit guarantees or even hints that this is what the GC does. On the contrary, I remember warnings against accessing any references an object may hold to other managed objects, from within its finalizer, because they may be gone already. It doesn't mean that the GC doesn't do its best, like CPython. It may still do so, and the exceptions only occur for circular references. If so, we are lucky. But it is also possible that the GC assumes the finalizers only dispose unmanaged resources and totally disregards any ordering, for performance reasons. Bad luck then, even worse than with our heuristics. It is also possible that .NET Framework does orderly finalization and .NET Core not. Or that it changes in the future.
From https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/basic-concepts#automatic-memory-management
Like other languages that assume the existence of a garbage collector, C# is designed so that the garbage collector may implement a wide range of memory management policies. For instance, C# does not require that destructors be run or that objects be collected as soon as they are eligible, or that destructors be run in any particular order, or on any particular thread.
Idem:
Under normal circumstances the destructor for the object is run once only, though implementation-specific APIs may allow this behavior to be overridden.