Feature Request: Automatic saving of metadata with arrow files
It would be great if a DataFrame and it's metadata could be saved to a single file. I believe that arrow supports this.
Right now metadata is not automatically saved when a dataframe is saved as an arrow file. I believe a PR was opened but it appears stalled.
It would be great to have this functionality.
current status of saving to arrow with metadata
using Arrow
using DataFrames
df = DataFrame(a = 1:3, b= 'A':'C')
Arrow.write("test.arrow", df)
df = DataFrame(Arrow.Table("test.arrow"))
colmetadata!(df, :a, "test", "hope this works"; style = :note)
colmetadata(df, :a, "test")
Arrow.write("test2.arrow", df)
df = DataFrame(Arrow.Table("test2.arrow"))
colmetadata(df, :a, "test")
ERROR: ArgumentError: no column-level metadata found for column "a"
Stacktrace:
[1] colmetadata(df::DataFrame, col::Symbol, key::String, default::DataFrames.MetadataMissingDefault; style::Bool)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:367
[2] colmetadata
@ ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360 [inlined]
[3] colmetadata(df::DataFrame, col::Symbol, key::String)
@ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360
[4] top-level scope
@ ~/Documents/GitHub/ItsLivePlayground.jl/src/RiverTest.jl:41
This is an issue with Arrow.jl. Hopefully the stalled PR will soon be merged and released by the maintainers.
https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/table.jl#L365-L366
https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/write.jl#L48
It looks like this could be a simple change made in the generic implementation of the getmetadata function in Arrow.jl to support writing metadata?
It would have to get the metadata dictionary from DataAPI, and then convert the contents to strings.