arrow-julia
arrow-julia copied to clipboard
Unable to write or read metadata from a Table after attaching metadata to the table
Hey JuliaData team,
Recently ran into an odd bug where when I had an Arrow Table, I could not write that table to an arrow
file after I had attached metadata to that table via the command setmetadata
. My dictionary was correctly typed as Dict(String, String)
and I was able to see my metadata attached to the table.
However, when I tried writing this table to a file, my REPL locked up and hung forever.Even after pressing C-c multiple times, it did not quit or stop processing but rather said `Warning: Force throwing a sigint" and still did not do anything. I had to manually close my terminal as it locked up the terminal.
Any thoughts as to why this could be happening? It was a rather large file so maybe that could've done something? I tried to reproduce this behavior below with a toy example:
using Arrow
using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
Arrow.write("test.arrow", df)
at = Arrow.Table("test.arrow")
my_meta = Dict("A" => "Cool numbers", "B" => "Cool letters")
Arrow.setmetadata!(at, my_meta)
Which shows this as expected:
julia> at
Arrow.Table with 4 rows, 2 columns, and schema:
:A Int64
:B String
with metadata given by a Dict{String, String} with 2 entries:
"B" => "Cool letters"
"A" => "Cool numbers"
But then continuing on
Arrow.write("meta.arrow", at)
mat = Arrow.Table("meta.arrow")
produces the following:
julia> mat = Arrow.Table("meta.arrow")
Arrow.Table with 4 rows, 2 columns, and schema:
:A Int64
:B String
julia> Arrow.getmetadata(mat)
Where no metadata is shown. So, two different issues in two different ways I tried to get metadata saved to my data. Am I doing something wrong here? Thanks all!
I think this might be https://github.com/JuliaData/Arrow.jl/issues/211 (but see also https://github.com/JuliaData/Arrow.jl/issues/90#issuecomment-914870675 for plans to soon switch the metadata system)
Yup, the missing metadata part looks like #211.
However, when I tried writing this table to a file, my REPL locked up and hung forever.Even after pressing C-c multiple times, it did not quit or stop processing but rather said `Warning: Force throwing a sigint" and still did not do anything. I had to manually close my terminal as it locked up the terminal.
Hmmm. This part is probably unrelated to #211, but would be interested if you had a way to reproduce it! It might just be a performance bug?
Ah yea, I think that part of my issue is the same as #211 @ericphanson and @jrevels . Thanks for the help there.
@jrevels - regarding the latter, this example is a bit more involved but here goes:
using ReadStatTables
using DataFrames
using Arrow
download("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DEMO_J.XPT", "data.xpt")
xpt_data = readstat("data.xpt")
xpt_meta = getmeta(xpt_data)
meta_labels = Dict(string(k) => v for (k, v) in pairs(xpt_meta.labels))
Arrow.Write("test.arrow", DataFrame(xpt_data))
arrow_table = Arrow.Table("test.arrow")
setmetadata!(arrow_table, meta_labels)
Arrow.Write("meta_test.arrow", arrow_table)
Let me know if you run into any issues with this! Thanks!
@TheCedarPrince I didn't get a chance to check on this recently, but are you still encountering this issue as of Arrow.jl v2.1 (that release contains a fix for #211)