Tanmay Mohapatra
Tanmay Mohapatra
Having the processing functions work on `Vector{UInt8}` will be useful for for files that would fit into memory. This would also work for files on disk that can be memory...
Yes, the reads need to be buffered by the abstraction of course. And most of the data access in this package are actually for reasonably large chunks of data, with...
The filepath is not used apart from initial opening of file, and for filtering partitioned datasets. Those may work too with minor changes if we use URLs instead. I have...
Yes, I think S3FS via FUSE may work well in this case.
It does seem like that from s3fs document, and I was not able to see files being written when I tried it. But it claims that using cache may make...
Data does get read in chunks internally. Could you post some benchmark results comparing times of Parquet.jl master to the last release?
I was able to trace some issues, PR here: https://github.com/JuliaIO/Parquet.jl/pull/77 But more can be done.
I tried the fork, following the readme pointed to above. Not sure if I got it right, but `read_parquet` seems to return a single row in 13 sec. ``` julia>...
Okay, I see the column now. But looks like you are disregarding the nested schema? So this will work only for non nested data?
And you seem to be using the same underlying page and column chunk read methods of Parquet.jl, right? So the current performance issues are on the layer above that.