arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

Unable to write or read metadata from a Table after attaching metadata to the table

Open TheCedarPrince opened this issue 3 years ago • 4 comments

Hey JuliaData team,

Recently ran into an odd bug where when I had an Arrow Table, I could not write that table to an arrow file after I had attached metadata to that table via the command setmetadata. My dictionary was correctly typed as Dict(String, String) and I was able to see my metadata attached to the table.

However, when I tried writing this table to a file, my REPL locked up and hung forever.Even after pressing C-c multiple times, it did not quit or stop processing but rather said `Warning: Force throwing a sigint" and still did not do anything. I had to manually close my terminal as it locked up the terminal.

Any thoughts as to why this could be happening? It was a rather large file so maybe that could've done something? I tried to reproduce this behavior below with a toy example:

using Arrow
using DataFrames

df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
Arrow.write("test.arrow", df)

at = Arrow.Table("test.arrow")

my_meta = Dict("A" => "Cool numbers", "B" => "Cool letters")

Arrow.setmetadata!(at, my_meta)

Which shows this as expected:

julia> at
Arrow.Table with 4 rows, 2 columns, and schema:
 :A  Int64
 :B  String

with metadata given by a Dict{String, String} with 2 entries:
  "B" => "Cool letters"
  "A" => "Cool numbers"

But then continuing on

Arrow.write("meta.arrow", at)

mat = Arrow.Table("meta.arrow")

produces the following:

julia> mat = Arrow.Table("meta.arrow")
Arrow.Table with 4 rows, 2 columns, and schema:
 :A  Int64
 :B  String

julia> Arrow.getmetadata(mat)

Where no metadata is shown. So, two different issues in two different ways I tried to get metadata saved to my data. Am I doing something wrong here? Thanks all!

TheCedarPrince avatar Sep 11 '21 00:09 TheCedarPrince

I think this might be https://github.com/JuliaData/Arrow.jl/issues/211 (but see also https://github.com/JuliaData/Arrow.jl/issues/90#issuecomment-914870675 for plans to soon switch the metadata system)

ericphanson avatar Sep 11 '21 00:09 ericphanson

Yup, the missing metadata part looks like #211.

However, when I tried writing this table to a file, my REPL locked up and hung forever.Even after pressing C-c multiple times, it did not quit or stop processing but rather said `Warning: Force throwing a sigint" and still did not do anything. I had to manually close my terminal as it locked up the terminal.

Hmmm. This part is probably unrelated to #211, but would be interested if you had a way to reproduce it! It might just be a performance bug?

jrevels avatar Sep 12 '21 22:09 jrevels

Ah yea, I think that part of my issue is the same as #211 @ericphanson and @jrevels . Thanks for the help there.

@jrevels - regarding the latter, this example is a bit more involved but here goes:

using ReadStatTables
using DataFrames
using Arrow

download("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DEMO_J.XPT", "data.xpt")
xpt_data = readstat("data.xpt")

xpt_meta = getmeta(xpt_data)
meta_labels = Dict(string(k) => v for (k, v) in pairs(xpt_meta.labels))

Arrow.Write("test.arrow", DataFrame(xpt_data))
arrow_table = Arrow.Table("test.arrow")

setmetadata!(arrow_table, meta_labels)
Arrow.Write("meta_test.arrow", arrow_table)

Let me know if you run into any issues with this! Thanks!

TheCedarPrince avatar Sep 13 '21 17:09 TheCedarPrince

@TheCedarPrince I didn't get a chance to check on this recently, but are you still encountering this issue as of Arrow.jl v2.1 (that release contains a fix for #211)

jrevels avatar Oct 27 '21 14:10 jrevels