vortex icon indicating copy to clipboard operation
vortex copied to clipboard

projected_schema segfaults on a Vortex scanner

Open paultiq opened this issue 1 month ago • 2 comments

Describe the bug

Using projected_schema() on a vortex scanner will segfault:

Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007fff96ab2d3c in __pyx_getprop_7pyarrow_8_dataset_7Scanner_projected_schema(_object*, void*) [clone .lto_priv.0] () from /home/extra/git/goodenv/.venv/lib/python3.13/site-packages/pyarrow/_dataset.cpython-313-x86_64-linux-gnu.so

Discovered in https://github.com/duckdb/duckdb-python/issues/187

To Reproduce

MRE:

import vortex as vx

vx.io.write(vx.array([{"col1": "a string"}]), 'foo.vortex')
x = vx.open('foo.vortex').to_dataset().scanner().projected_schema

Expected behavior

Expected it to behave like:

from pyarrow import parquet as pq
from pyarrow import dataset as ds

x = ds.dataset(pq.read_table("foo.parquet")).scanner().projected_schema

Which returns the schema.

Additional context

No response

paultiq avatar Nov 21 '25 19:11 paultiq

Heh. Okay, it looks like we inherit from pyarrow.dataset.Scanner but never define a projected_schema. I’m not sure exactly how that turns into a segfault but the fix is clear: define that.

danking avatar Nov 21 '25 23:11 danking

Can i take this up?. should be straightforward changes in VortexScanner class

sherlockbeard avatar Nov 24 '25 16:11 sherlockbeard