Python bindings are missing a clean way to close datasets (also, support for context managers)
I just came across an inconsistency. Datasets, once opened, can not be directly closed via the Python bindings. There is no close or any similar method in this class. In a similar manner, context managers via __exit__ are not supported.
There are a couple of commonly recommended workarounds, the most prominent one being the use of del. Long story short: This is a notoriously unreliable way of hoping for Python's garbage collector to do the right thing™ (see warning in Python's official documentation).
Interestingly, GDAL knows how to close datasets properly as can be seen here in rasterio, although this appears to be an internal / "private" API.
Have you already seen https://gdal.org/api/python_gotchas.html#saving-and-closing-datasets-datasources?
No, but it is a variation of what this issue is about:
To save and close GDAL raster datasets or OGR vector datasources, the object needs to be dereferenced, such as setting it to None, a different value, or deleting the object. If there are more than one copies of the dataset or datasource object, then each copy needs to be dereferenced. [...] The last dereference to the raster dataset writes the data modifications and closes the raster file.
You need to manually de-reference the dataset completely. Good luck with that in modern Python. Then, you must literally hope for the garbage collector :) I can imagine tons of ways of how this can fail, effectively creating a non-deterministic memory leak. Besides, it has become a common thing to turn the GC off for performance reasons, but that's yet another story.
Am I misunderstanding something?
No, you understood it right, I just wanted to point out that it is not a new observation and that a workaround is also documented in GDAL documentation, not only in gis.stackexchange.
Let's see if developers can tell why the situation is what it is, and what it would mean in real life to introduce a clean way to close.
I agree with @s-m-e , this is a dangerous overlook as the garbage collector offers no guarantee for when resources will be freed.
Why isn't the GDALClose() function exposed in the Python API? Is there a technical problem preventing it? Exposing this function would at least allow users to close their datasets properly and even implement their own wrapper with an __exit__() method for use in a with block.
Why isn't the
GDALClose()function exposed in the Python API? Is there a technical problem preventing it?
The technical problem is that, for a clean resolution, we'd also need to make sure that any further call to this Python Dataset object (and any object that it owns indirectly through the corresponding C++ GDALDataset) doesn't lead to a crash. Which would involve likely quite a lot of boilerplate in the SWIG .i files
Is this resolved by https://github.com/OSGeo/gdal/pull/8454?
yes