Joris Van den Bossche comments

Results 844 comments of


                                            Joris Van den Bossche

Possible encodings (memory layout) for array of geometries

@jnh5y I am not aware of a good description / docs about GEOS' memory model, and I am also not an expert on GEOS' inner details. So probably the best...

Clear notebooks?

Even when there are no exercises, you can remove the content so people have to follow along with running the code. Of course, if you want them to focus mainly...

ENH: read a list of GIS files into chunks

I suppose we could actually even start with a `dask_geopandas.read_file` that only supports this use case, as it seems simpler than chunking one file (#11). In dask the logic behind...

BUG: to_parquet(append=True) raises ValueError Appended dtypes differ

Yeah, so the problem is that the *dataframe* has "geometry" dtype, while what we actually write is "object" dtype (binary type in arrow/parquet). I ran into essentially the same problem...

Missing geometries

From the point of view of the storage (Arrow memory layout, Parquet file format), missingness in general is certainly supported, I think, and several missingness levels can also be possible...

Missing geometries

> Theoretically the innermost child array (a big flat buffer of doubles) containing coordinate values can also be nullable and have null elements, but I think here NaN would be...

BUG: MacOS Arm64 Pyogrio + GeoPandas feather I/O issues via pip

Some time ago I diagnosed an issue with fiona + arrow combination (https://github.com/conda-forge/gdal-feedstock/issues/592) where importing fiona messed up some symbols for pyarrow. But so that was the other way around,...

ENH: add read_postgis

One question that comes up here (and it's the same question for spatial repartitioning a dask.dataframe to conform to given regions): how to deal with possible duplicates? If you specify...

ENH: add read_postgis

> I am just afraid that it can be expensive. When starting from an existing dask.dataframe (not necessarily in memory, can also be backed by reading in from eg parquet),...

ENH: add read_postgis

> Are we actually able to put this logic to the SQL query? If we can*not* (and thus doing the "dumb reading from PostGIS and filtering/repartitioning on the dask side"),...