Joris Van den Bossche

Results 844 comments of Joris Van den Bossche

Some additional points that might help in clarifying: - One of the core concepts of Arrow is that it is a *columnar* layout, so which means all values in a...

I made a notebook exploring a bit more in detail the different memory representations (nested list vs struct) with a small example: https://nbviewer.jupyter.org/gist/jorisvandenbossche/dc4e98cf5c9fdbb64769716d046d0edf (feedback on how this can be illustrated...

@kylebarron thanks for making this link! That's interesting ;) @trxcllnt if I am reading the comments in `cuspatial` source code correctly, the memory layout for polygons you are using is...

@srenoes AFAIK the data types that awkward array supports are very similar to the ones supported by Arrow. I think the physical represented for nested lists or struct arrays is...

> Hi all, I work on LocationTech GeoMesa, and we've got an existing implementation of (Multi){Point,LineString,Polygon}s using the nested array approach in Scala for Apache Arrow, Parquet, and Orc. @jnh5y...

> [@pramsey] Not a joke: much of what has been said above is reflected in the hoary old shapefile specification. https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf From quickly reading through the document, I think there...

@jnh5y thanks! That's useful (I see you also have versions of each for storing the coordinates as float vs double). In addition, I also saw https://www.geomesa.org/documentation/stable/user/datastores/analytic_queries.html#arrow-encoding mentioned now in the...

Trying to summarize some of the open questions: * Do we use a plain nested `ListArray` or a `StructArray` with custom fields? * I made a [notebook](https://nbviewer.jupyter.org/gist/jorisvandenbossche/dc4e98cf5c9fdbb64769716d046d0edf) earlier that compares...

Last week, @trxcllnt, @thomcom, @kylebarron and I had a productive chat. Trying to summarize here the points we discussed: - We think that Arrow's **nested list arrays** are a good...

The `ParquetDataset.schema` attribute is generally inferrred (first checking `_metadata`, and otherwise `_common_metadata`, and otherwise the first file of the dataset), so in general "metadata" of the schema will not reflect...