arrow-julia
arrow-julia copied to clipboard
Non-reproducible ReadOnlyMemoryError
This is not the kind of issue I like to write, because I have no minimal example with a reproducible error. Still, perhaps the information I have tells you something:
data = DataFrame(Arrow.Table(input))
Matrix{Float64}(data[:,cols])
has in some cases lead to:
ERROR: TaskFailedException
nested task error: ReadOnlyMemoryError()
Stacktrace:
[1] copy
@ ./array.jl:349 [inlined]
[2] copy
@ ~/.julia/packages/Arrow/k23fl/src/arraytypes/primitive.jl:37 [inlined]
[3] _preprocess_column(col::Arrow.Primitive{Float64, Vector{Float64}}, len::Int64, copycols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:221
[4] (::DataFrames.var"#150#152"{Bool, Vector{AbstractVector{T} where T}, Int64})()
@ DataFrames ./threadingconstructs.jl:169
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:369
[2] macro expansion
@ ./task.jl:388 [inlined]
[3] DataFrames.DataFrame(columns::Vector{AbstractVector{T} where T}, colindex::DataFrames.Index; copycols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:195
[4] manipulate(df::DataFrames.DataFrame, args::Vector{Int64}; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:1418
[5] manipulate(df::DataFrames.DataFrame, c::Vector{String}; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:1427
[6] #select#473
@ ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:926 [inlined]
[7] getindex(df::DataFrames.DataFrame, row_ind::Colon, col_inds::Vector{String})
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:583
[8] infer(output::String; input::String, params::String)
@ App /app/App/src/infer.jl:6
I currently have no access to the arrow file or situation causing this problem, but I will investigate further.
I would try to see if you get the error still with DataFrame(Arrow.Table(input); copycols=true)
Hi, thanks for the answer. The problem is a corrupt arrow file. Read in julia strangely succeeds at the second try.
pyarrow returns:
OSError: Expected to be able to read 13350733000 bytes for message body, got 13093734914
It is a 13g file which was written from a dataframe to arrow with Arrow.write. The error has happened with varying input data.
Hmmmm, yes, this is tricky to diagnose. The ReadOnlyMemoryError
and pyarrow seem to suggest that the Arrow.Primitive
may be pointing to invalid memory; i.e. there's only 1000 bytes to read from, but the Arrow.Primitive
assumes it can access 1050
. We could probably add some extra validation when reading to also check that case.
But more concerning is that Arrow.write
seems to be writing in the metadata that the file/buffers are a certain size, but not actually writing that many bytes. If you happen to have a way to share the file privately with me, I'm happy to try and reproduce and take a look (I've setup private transfers w/ folks before; let me know if possible).