DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Feature Request: Automatic saving of metadata with arrow files

Open alex-s-gardner opened this issue 1 year ago • 2 comments

It would be great if a DataFrame and it's metadata could be saved to a single file. I believe that arrow supports this.

Right now metadata is not automatically saved when a dataframe is saved as an arrow file. I believe a PR was opened but it appears stalled.

It would be great to have this functionality.

current status of saving to arrow with metadata

using Arrow
using DataFrames

df = DataFrame(a = 1:3, b= 'A':'C')
Arrow.write("test.arrow", df)
df = DataFrame(Arrow.Table("test.arrow"))

colmetadata!(df, :a, "test", "hope this works"; style = :note)
colmetadata(df, :a, "test")

Arrow.write("test2.arrow", df)
df = DataFrame(Arrow.Table("test2.arrow"))
colmetadata(df, :a, "test")
ERROR: ArgumentError: no column-level metadata found for column "a"
Stacktrace:
 [1] colmetadata(df::DataFrame, col::Symbol, key::String, default::DataFrames.MetadataMissingDefault; style::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:367
 [2] colmetadata
   @ ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360 [inlined]
 [3] colmetadata(df::DataFrame, col::Symbol, key::String)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360
 [4] top-level scope
   @ ~/Documents/GitHub/ItsLivePlayground.jl/src/RiverTest.jl:41

alex-s-gardner avatar Nov 28 '24 22:11 alex-s-gardner

This is an issue with Arrow.jl. Hopefully the stalled PR will soon be merged and released by the maintainers.

bkamins avatar Nov 29 '24 12:11 bkamins

https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/table.jl#L365-L366

https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/write.jl#L48

It looks like this could be a simple change made in the generic implementation of the getmetadata function in Arrow.jl to support writing metadata?

It would have to get the metadata dictionary from DataAPI, and then convert the contents to strings.

asinghvi17 avatar Dec 18 '24 18:12 asinghvi17