Kyle Barron

Results 1642 comments of Kyle Barron

Since LanceSchema has pyarrow interop anyways, https://github.com/lancedb/lance/blob/fa089be2bf0457e6bf8a92c6ef67e43e4c0c3177/python/src/schema.rs#L47-L61 It might as well expose/ingest c schemas too. You could easily reuse the pyarrow dunders if you don't want to manage the rust...

@H-Plus-Time let me know what you think of #57 . I think it's a bit cleaner

No obvious hurdles. I can give you pointers if you get stuck. You should be able to stream over the batches without loading the entire file into memory

In particular you should be able to use a [`GeoParquetRecordBatchReader`](https://docs.rs/geoarrow/0.4.0-beta.2/geoarrow/io/parquet/struct.GeoParquetRecordBatchReader.html) to iterate over record batches from the Parquet file. You can wrap each one as a [`Table`](https://docs.rs/geoarrow/0.4.0-beta.2/geoarrow/table/struct.Table.html), which implements [GeozeroDatasource](https://docs.rs/geoarrow/0.4.0-beta.2/geoarrow/table/struct.Table.html#impl-GeozeroDatasource-for-Table)

In https://github.com/geopandas/pyogrio/pull/206#issuecomment-1496886332 there was some discussion of whether to return a `RecordBatchReader` (or a subclass). I don't recall a discussion of separating it from pyarrow. Overall this is exciting! I'm...

> I think the main question is whether we use the numpy C API, because in that case it's probably more complicated to have it optional. I was thinking the...

My guess had been that those symbols would be looked up when code was called that needed, but I really don't know.

> It seems difficult to fully prove that the capsule works in the absence of pyarrow It looks like you can use [nanoarrow's python bindings](https://github.com/apache/arrow-nanoarrow/tree/main/python#array-streams) for that, and the `nanoarrow.c_array_stream()`...

For now you can probably just assert that the returned object is not an instance of any known pyarrow class and assert that it works with `nanoarrow.c_array_stream()`?

> How do you see that working out downstream, e.g., in GeoPandas? Any pyarrow-based downstream library can convert the stream to a pyarrow by passing it to a `RecordBatchReader` I...