projected_schema segfaults on a Vortex scanner
Describe the bug
Using projected_schema() on a vortex scanner will segfault:
Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007fff96ab2d3c in __pyx_getprop_7pyarrow_8_dataset_7Scanner_projected_schema(_object*, void*) [clone .lto_priv.0] () from /home/extra/git/goodenv/.venv/lib/python3.13/site-packages/pyarrow/_dataset.cpython-313-x86_64-linux-gnu.so
Discovered in https://github.com/duckdb/duckdb-python/issues/187
To Reproduce
MRE:
import vortex as vx
vx.io.write(vx.array([{"col1": "a string"}]), 'foo.vortex')
x = vx.open('foo.vortex').to_dataset().scanner().projected_schema
Expected behavior
Expected it to behave like:
from pyarrow import parquet as pq
from pyarrow import dataset as ds
x = ds.dataset(pq.read_table("foo.parquet")).scanner().projected_schema
Which returns the schema.
Additional context
No response
Heh. Okay, it looks like we inherit from pyarrow.dataset.Scanner but never define a projected_schema. I’m not sure exactly how that turns into a segfault but the fix is clear: define that.
Can i take this up?. should be straightforward changes in VortexScanner class