Fiona
Fiona copied to clipboard
Opening same GPKG file/layer for write => mysterious bug
I wrote a class to stream dataframes into GeoPackage, and by mistake it opened a file for writing twice. It would succesfully write the data, but would crash when the program was terminating.
ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_insert_feature_count_sample" AFTER INSERT ON "sample" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count + 1 WHERE lower(table_name) = lower('sample'); END;) failed: trigger "trigger_insert_feature_count_sample" already exists
ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_delete_feature_count_sample" AFTER DELETE ON "sample" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('sample'); END;) failed: trigger "trigger_delete_feature_count_sample" already exists
ERROR 1: Spatial index already existing
Traceback (most recent call last):
File "fiona/_err.pyx", line 201, in fiona._err.GDALErrCtxManager.__exit__
fiona._err.CPLE_AppDefinedError: b'Spatial index already existing'
Exception ignored in: 'fiona._shim.gdal_flush_cache'
Traceback (most recent call last):
File "fiona/_err.pyx", line 201, in fiona._err.GDALErrCtxManager.__exit__
I thought it must have had something to do with threads created in the background. But turned out it was just that a condition was checked in a wrong way, and fiona.open
ran again and again on every chunk.
Here's a code that reproduces this.
import fiona
from collections import OrderedDict
schema = {'geometry': 'Point', 'properties': OrderedDict()}
_handler = fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema)
row = {'geometry': {'type': 'Point', 'coordinates': (2, 2)}, 'properties': {}}
data = [row, row]
_handler.writerecords(data)
_handler = fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema)
_handler.writerecords(data)
_handler.close()
My suggestion is to probably check if a layer exists (or opened for writing if possible), and at least show a warning. Because otherwise I was thinking it's because of not flushing the data, or closing it too quickly.
Can you please attach example of tiny *.gpkg
to reproduce this issue?
You don't need one, it's created right there in the code.
@culebron this is an interesting issue. The second call to fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema)
should delete the layer created in the first. We're fortunate that this only results in a Python exception and doesn't crash the Python process itself.
I encourage you to only write to datasets within the context of a with fiona.open()
block. This will guard you from issues like you've reported, which I'm not sure how to solve at the moment.
Yep. In my case with
block is not applicable, because open and close happen in different functions. So options are
- use
contextlib.ExitStack
- open/close in append mode
Just googled this bug myself another time while writing multiprocessing code. LOL.
@culebron this remains a complicated issue. If Fiona datasets don't call GDALClose when they are deallocated, data will not be written to disk and/or memory will leak in some cases. And this is what happens in your script, GDALClose is called twice for the same file, and very late in an unexpected way the second time. Ideally, the GPKG driver should lock if it's not able to gracefully handle a double close, don't you think? I wonder if the issue isn't better solved in GDAL/OGR... maybe one of us should ask on gdal-dev to see what Even's perspective is.