pyogrio icon indicating copy to clipboard operation
pyogrio copied to clipboard

GDAL errors are also "printed" to stdout in addition to raised as Python exception

Open jorisvandenbossche opened this issue 2 years ago • 1 comments

A small example:

import pyogrio
import geopandas
from shapely.geometry import Point, LineString, Polygon, box, MultiPolygon

df = geopandas.GeoDataFrame(
    {"col": [1, 2, 3]},
    geometry=[Point(0, 0), LineString([(0, 0), (1, 1)]), box(0, 0, 1, 1)],
    crs="EPSG:4326",
)


In [3]: pyogrio.write_dataframe(df, "test_pyogrio_mixed.shp", driver="ESRI Shapefile")
ERROR 1: Attempt to write non-point (LINESTRING) geometry to point shapefile.
---------------------------------------------------------------------------
FeatureError                              Traceback (most recent call last)
<ipython-input-3-f50fe8b379fe> in <module>
----> 1 pyogrio.write_dataframe(df, "test_pyogrio_mixed_truly.shp", driver="ESRI Shapefile")

~/scipy/repos/pyogrio/pyogrio/geopandas.py in write_dataframe(df, path, layer, driver, encoding, geometry_type, **kwargs)
    248             crs = geometry.crs.to_wkt(WktVersion.WKT1_GDAL)
    249 
--> 250     write(
    251         path,
    252         layer=layer,

~/scipy/repos/pyogrio/pyogrio/raw.py in write(path, geometry, field_data, fields, layer, driver, geometry_type, crs, encoding, **kwargs)
    186         )
    187 
--> 188     ogr_write(
    189         str(path),
    190         layer=layer,

~/scipy/repos/pyogrio/pyogrio/_io.pyx in pyogrio._io.ogr_write()

FeatureError: Could not add feature to layer at index 1: Attempt to write non-point (LINESTRING) geometry to point shapefile.

In the above case, we detect that GDAL errors, get the error message and include that in the Python exception we raise. But, you can also see that the original GDAL error is still printed as well (the first line before the traceback, ERROR 1: ...).

This is not that a big deal, but I think ideally we would only have the exception (for example, if you catch the exception to handle it in some way, you still get the printed message). And, when using fiona, this doesn't happen (df.to_file("test_fiona_mixed.shp", driver="ESRI Shapefile") is the equivalent of the above pyogrio write, and this only gives a python exception).

I have been wondering why this is (we do use CPLErrorReset after getting the error type and message). Comparing to Fiona, the error handling itself is very similar (since copied from Fiona), but in Fiona there is in addition also a error handler registered (CPLPushErrorHandler):

https://github.com/Toblerity/Fiona/blob/9c8fe736d4381526905c6976c399a33d10e3ecbf/fiona/_env.pyx#L137-L155 https://github.com/Toblerity/Fiona/blob/9c8fe736d4381526905c6976c399a33d10e3ecbf/fiona/_env.pyx#L387-L395

jorisvandenbossche avatar Apr 29 '22 07:04 jorisvandenbossche

I noticed that the GDAL python SWIG bindings have this handler defined, which I suppose is basically what we want as well (hide errors logs because those are already handled as python exception):

static CPLErrorHandler pfnPreviousHandler = CPLDefaultErrorHandler;

static void CPL_STDCALL
PythonBindingErrorHandler(CPLErr eclass, int code, const char *msg )
{
  /*
  ** Generally we want to suppress error reporting if we have exceptions
  ** enabled as the error message will be in the exception thrown in
  ** Python.
  */

  /* If the error class is CE_Fatal, we want to have a message issued
     because the CPL support code does an abort() before any exception
     can be generated */
  if (eclass == CE_Fatal ) {
    pfnPreviousHandler(eclass, code, msg );
  }

  /*
  ** We do not want to interfere with non-failure messages since
  ** they won't be translated into exceptions.
  */
  else if (eclass != CE_Failure ) {
    pfnPreviousHandler(eclass, code, msg );
  }
  else {
    CPLSetThreadLocalConfigOption("__last_error_message", msg);
    CPLSetThreadLocalConfigOption("__last_error_code", CPLSPrintf("%d", code));
  }
}

(copying the snippet here, because doesn't allow to link to the line: https://github.com/OSGeo/gdal/blob/master/swig/python/extensions/ogr_wrap.cpp)

There is also a builtin CPLQuietErrorHandler (https://gdal.org/api/cpl.html#_CPPv420CPLQuietErrorHandler6CPLErr11CPLErrorNumPKc), but that also hides warning messages.

jorisvandenbossche avatar Apr 29 '22 07:04 jorisvandenbossche

This also affects warnings. Ideally, you'd be able to supress warnings using the stdlib, but since those are simply logged from gdal there is no way to do so. The annoying one I'm facing:

Warning 1: Value 113034553.399706051 of field AREA of feature 3746 not successfully written. Possibly due to too larger number with respect to field width

Oreilles avatar Mar 27 '23 21:03 Oreilles

@Oreilles did you figure out a way to solve that? I write the same file with fiona and it doesn't complain. But pyogrio does. Would love to write at the speed of pyogrio.

I assume this negatively impacts reading the files afterwards as well? Are the values NULL? Or are they truncated floats? Would be good to know what happens.

@brendan-ward any ideas?

codeananda avatar Jan 19 '24 15:01 codeananda

I could not fix the bug, however, I found it was possible to redirect the GDAL output elsewhere so as to not pollute my program output. Here is the code to just throw GDAL's output away (Change dev/null to any file path if you want to preserve it somehow):

from pyogrio import set_gdal_config_options

set_gdal_config_options({"CPL_LOG": "/dev/null"})

I assume this negatively impacts reading the files afterwards as well? Are the values NULL? Or are they truncated floats? Would be good to know what happens.

All values were already correctly written for me, the warning was unexpected which is why I decided to suppress it. Of course this comes with the downside that if a valid warning were to be emitted I would miss it.

Oreilles avatar Jan 19 '24 16:01 Oreilles

@codeananda which version of pyogrio are you using? Because my understanding is that the warnings issue should also be solved by https://github.com/geopandas/pyogrio/pull/242 (and I know in our CI we use pytest.mark.filterwarnings to ignore some of them)

jorisvandenbossche avatar Jan 19 '24 16:01 jorisvandenbossche

We had this specific warning in the geopandas tests as well, and added a filter for this (so filtering specifically this warning with python warnings.filterwarnings is certainly possible). See https://github.com/geopandas/geopandas/pull/3040#discussion_r1349713787 for some discussion.

I don't really understand why GDAL raises this warning, but our conclusion was also that it is probably harmless (you could open an issue about it upstream in osgeo/gdal)

jorisvandenbossche avatar Jan 19 '24 16:01 jorisvandenbossche

Wow thanks for the super speedy responses!

I'm using pyogrio 0.7.2.

Good to know that the values are actually written correctly.

@jorisvandenbossche maybe I'm missing something but it seems like #242 is related to suppressing warnings for tests... but are these warnings actually suppressed in the code?

codeananda avatar Jan 19 '24 17:01 codeananda