ReadStatTables.jl icon indicating copy to clipboard operation
ReadStatTables.jl copied to clipboard

Table level metadata not retrievable after export

Open RagavRajan opened this issue 8 months ago • 1 comments

Consider the R haven package. Metadata which is set to a data.frame can be retrieved on export and emport.

library(haven)

df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))

# Add metadata
attr(df, "author") <- "John Doe"

write_xpt(df, "df.xpt")

df_read <- read_xpt("df.xpt")

attributes(df)
#> $names
#> [1] "Name" "Age" 
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2
#> 
#> $author
#> [1] "John Doe"

Created on 2025-03-07 with reprex v2.1.1

In ReadStatTables, the table level metadata set prior to exporting a DataFrame is not retrievable after import. It is overwritten by File-level metadata.

df = DataFrame(Name=["Alice", "Bob"], Age=[25, 30])

metadata!(df, "author", "John Doe", style=:note)

writestat("df.xpt", df)

df_read = readstat("df.xpt") |> DataFrame

metadata(df_read)
#> Dict{String, Any} with 13 entries:
#>   "file_ext"             => ".xpt"
#>   "modified_time"        => DateTime("2025-03-07T12:33:07")
#>   "file_format_version"  => 5
#>   "file_format_is_64bit" => false
#>   "table_name"           => "DATASET"
#>   "notes"                => String[]
#>   "file_encoding"        => ""
#>   "file_label"           => ""
#>   "var_count"            => 2
#>   "row_count"            => -1
#>   "creation_time"        => DateTime("2025-03-07T12:33:07")
#>   "endianness"           => READSTAT_ENDIAN_NONE
#>   "compression"          => READSTAT_COMPRESS_NONE

RagavRajan avatar Mar 07 '25 07:03 RagavRajan

Interesting. I am not sure where haven stores the author attribute if it's not something special about xpt files.

junyuan-chen avatar Mar 07 '25 07:03 junyuan-chen