arrow-julia
arrow-julia copied to clipboard
Reading only a subset of columns
Please correct me if this is possible already. I looked through the source code and the documentation and did not find a clear way to do this: basically, I want to read a FeatherV2 file, but not mmap every single column. I already know which columns I need and I'd like to tell Arrow.Table
the subset of columns I want read into memory.
This is similar to this issue on Feather.jl.
This seems to be possible in the R arrow package using col_select
.
Hey @CarlColglazier, thanks for opening an issue. We could probably support keyword arguments like select
and drop
, but note that it wouldn't change how much memory is "mmapped". Arrow tables are stored in a single memory blob and there isn't really a way to only mmap a few columns. You still have to read the header/metadata to figure out the offsets of specific columns into the data.
So, happy to support select
/drop
, since it can be convenient to only get back the columns you really need, but I just want to point out that I wouldn't expect there to be any real effect on memory/performance.
I went through the feather c++ source code and it seems this hasn't been fixed yet in the upstream C++ api. Am i right ?