arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

Non-reproducible ReadOnlyMemoryError

Open altre opened this issue 3 years ago • 3 comments

This is not the kind of issue I like to write, because I have no minimal example with a reproducible error. Still, perhaps the information I have tells you something:

data = DataFrame(Arrow.Table(input))
Matrix{Float64}(data[:,cols])

has in some cases lead to:

ERROR: TaskFailedException
nested task error: ReadOnlyMemoryError()
Stacktrace:
[1] copy
@ ./array.jl:349 [inlined]
[2] copy
@ ~/.julia/packages/Arrow/k23fl/src/arraytypes/primitive.jl:37 [inlined]
[3] _preprocess_column(col::Arrow.Primitive{Float64, Vector{Float64}}, len::Int64, copycols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:221
[4] (::DataFrames.var"#150#152"{Bool, Vector{AbstractVector{T} where T}, Int64})()
@ DataFrames ./threadingconstructs.jl:169
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:369
[2] macro expansion
@ ./task.jl:388 [inlined]
[3] DataFrames.DataFrame(columns::Vector{AbstractVector{T} where T}, colindex::DataFrames.Index; copycols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:195
[4] manipulate(df::DataFrames.DataFrame, args::Vector{Int64}; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:1418
[5] manipulate(df::DataFrames.DataFrame, c::Vector{String}; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:1427
[6] #select#473
@ ~/.julia/packages/DataFrames/nxjiD/src/abstractdataframe/selection.jl:926 [inlined]
[7] getindex(df::DataFrames.DataFrame, row_ind::Colon, col_inds::Vector{String})
@ DataFrames ~/.julia/packages/DataFrames/nxjiD/src/dataframe/dataframe.jl:583
[8] infer(output::String; input::String, params::String)
@ App /app/App/src/infer.jl:6

I currently have no access to the arrow file or situation causing this problem, but I will investigate further.

altre avatar Jun 23 '21 06:06 altre

I would try to see if you get the error still with DataFrame(Arrow.Table(input); copycols=true)

ericphanson avatar Jun 23 '21 10:06 ericphanson

Hi, thanks for the answer. The problem is a corrupt arrow file. Read in julia strangely succeeds at the second try. pyarrow returns: OSError: Expected to be able to read 13350733000 bytes for message body, got 13093734914 It is a 13g file which was written from a dataframe to arrow with Arrow.write. The error has happened with varying input data.

altre avatar Jun 23 '21 14:06 altre

Hmmmm, yes, this is tricky to diagnose. The ReadOnlyMemoryError and pyarrow seem to suggest that the Arrow.Primitive may be pointing to invalid memory; i.e. there's only 1000 bytes to read from, but the Arrow.Primitive assumes it can access 1050. We could probably add some extra validation when reading to also check that case.

But more concerning is that Arrow.write seems to be writing in the metadata that the file/buffers are a certain size, but not actually writing that many bytes. If you happen to have a way to share the file privately with me, I'm happy to try and reproduce and take a look (I've setup private transfers w/ folks before; let me know if possible).

quinnj avatar Jun 23 '21 19:06 quinnj