Tanmay Mohapatra comments

Results 139 comments of


                                            Tanmay Mohapatra

Reading from raw bytes?

Having the processing functions work on `Vector{UInt8}` will be useful for for files that would fit into memory. This would also work for files on disk that can be memory...

Reading from raw bytes?

Yes, the reads need to be buffered by the abstraction of course. And most of the data access in this package are actually for reasonably large chunks of data, with...

Reading from raw bytes?

The filepath is not used apart from initial opening of file, and for filtering partitioned datasets. Those may work too with minor changes if we use URLs instead. I have...

Reading from raw bytes?

Yes, I think S3FS via FUSE may work well in this case.

Reading from raw bytes?

It does seem like that from s3fs document, and I was not able to see files being written when I tried it. But it claims that using cache may make...

Serious performance issues with read

Data does get read in chunks internally. Could you post some benchmark results comparing times of Parquet.jl master to the last release?

Serious performance issues with read

I was able to trace some issues, PR here: https://github.com/JuliaIO/Parquet.jl/pull/77 But more can be done.

Serious performance issues with read

I tried the fork, following the readme pointed to above. Not sure if I got it right, but `read_parquet` seems to return a single row in 13 sec. ``` julia>...

Serious performance issues with read

Okay, I see the column now. But looks like you are disregarding the nested schema? So this will work only for non nested data?

Serious performance issues with read

And you seem to be using the same underlying page and column chunk read methods of Parquet.jl, right? So the current performance issues are on the layer above that.