Fiona icon indicating copy to clipboard operation
Fiona copied to clipboard

Opening same GPKG file/layer for write => mysterious bug

Open culebron opened this issue 6 years ago • 6 comments

I wrote a class to stream dataframes into GeoPackage, and by mistake it opened a file for writing twice. It would succesfully write the data, but would crash when the program was terminating.

ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_insert_feature_count_sample" AFTER INSERT ON "sample" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count + 1 WHERE lower(table_name) = lower('sample'); END;) failed: trigger "trigger_insert_feature_count_sample" already exists
ERROR 1: sqlite3_exec(CREATE TRIGGER "trigger_delete_feature_count_sample" AFTER DELETE ON "sample" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('sample'); END;) failed: trigger "trigger_delete_feature_count_sample" already exists
ERROR 1: Spatial index already existing
Traceback (most recent call last):
  File "fiona/_err.pyx", line 201, in fiona._err.GDALErrCtxManager.__exit__
fiona._err.CPLE_AppDefinedError: b'Spatial index already existing'
Exception ignored in: 'fiona._shim.gdal_flush_cache'
Traceback (most recent call last):
  File "fiona/_err.pyx", line 201, in fiona._err.GDALErrCtxManager.__exit__

I thought it must have had something to do with threads created in the background. But turned out it was just that a condition was checked in a wrong way, and fiona.open ran again and again on every chunk.

Here's a code that reproduces this.

import fiona
from collections import OrderedDict

schema = {'geometry': 'Point', 'properties': OrderedDict()}
_handler = fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema)
row = {'geometry': {'type': 'Point', 'coordinates': (2, 2)}, 'properties': {}}
data = [row, row]

_handler.writerecords(data)

_handler = fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema)
_handler.writerecords(data)

_handler.close()

My suggestion is to probably check if a layer exists (or opened for writing if possible), and at least show a warning. Because otherwise I was thinking it's because of not flushing the data, or closing it too quickly.

culebron avatar Dec 19 '18 16:12 culebron

Can you please attach example of tiny *.gpkg to reproduce this issue?

drnextgis avatar Dec 20 '18 22:12 drnextgis

You don't need one, it's created right there in the code.

culebron avatar Dec 21 '18 21:12 culebron

@culebron this is an interesting issue. The second call to fiona.open('sample.gpkg', 'w', driver='GPKG', schema=schema) should delete the layer created in the first. We're fortunate that this only results in a Python exception and doesn't crash the Python process itself.

I encourage you to only write to datasets within the context of a with fiona.open() block. This will guard you from issues like you've reported, which I'm not sure how to solve at the moment.

sgillies avatar Dec 21 '18 22:12 sgillies

Yep. In my case with block is not applicable, because open and close happen in different functions. So options are

  • use contextlib.ExitStack
  • open/close in append mode

culebron avatar Dec 22 '18 00:12 culebron

Just googled this bug myself another time while writing multiprocessing code. LOL.

culebron avatar Nov 02 '19 13:11 culebron

@culebron this remains a complicated issue. If Fiona datasets don't call GDALClose when they are deallocated, data will not be written to disk and/or memory will leak in some cases. And this is what happens in your script, GDALClose is called twice for the same file, and very late in an unexpected way the second time. Ideally, the GPKG driver should lock if it's not able to gracefully handle a double close, don't you think? I wonder if the issue isn't better solved in GDAL/OGR... maybe one of us should ask on gdal-dev to see what Even's perspective is.

sgillies avatar Nov 02 '19 15:11 sgillies