ReadStatTables.jl
ReadStatTables.jl copied to clipboard
Writing `DataFrame` with missing value in `String` columns
Consider the haven R package. The behavior of handling character columns with NA written out is, NA is converted into empty strings ""
library(haven)
df_write <- data.frame(USUBJID = c("01-701-1028", NA, NA), EVID = c(NA, 0, 1))
df_write
#> USUBJID EVID
#> 1 01-701-1028 NA
#> 2 <NA> 0
#> 3 <NA> 1
write_xpt(df_write, "df.xpt")
df_read <- read_xpt("df.xpt")
df_read
#> # A tibble: 3 × 2
#> USUBJID EVID
#> <chr> <dbl>
#> 1 "01-701-1028" NA
#> 2 "" 0
#> 3 "" 1
Created on 2025-03-07 with reprex v2.1.1
But in ReadStatTables
df_write = DataFrame(USUBJID=["01-701-1028", missing, missing], EVID=[missing, 0, 1])
df_write
#> 3×2 DataFrame
#> Row │ USUBJID EVID
#> │ String? Int64?
#> ─────┼──────────────────────
#> 1 │ 01-701-1028 missing
#> 2 │ missing 0
#> 3 │ missing 1
writestat("df.xpt", df_write)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type String
Closest candidates are:
convert(::Type{String}, ::StringManipulation.Decoration)
@ StringManipulation ~/.julia/packages/StringManipulation/5Zfrz/src/decorations.jl:315
convert(::Type{String}, ::Base.JuliaSyntax.Kind)
@ Base /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/base/JuliaSyntax/src/kinds.jl:975
convert(::Type{String}, ::String)
@ Base essentials.jl:321
...
Stacktrace:
[1] setindex!(A::Vector{String}, x::Missing, i1::Int64)
@ Base ./array.jl:1021
[2] _unsafe_copyto!(dest::Vector{String}, doffs::Int64, src::Vector{Union{Missing, String}}, soffs::Int64, n::Int64)
@ Base ./array.jl:299
[3] unsafe_copyto!
@ ./array.jl:353 [inlined]
[4] _copyto_impl!
@ ./array.jl:376 [inlined]
[5] copyto!
@ ./array.jl:363 [inlined]
[6] copyto!(dest::Vector{String}, src::Vector{Union{Missing, String}})
@ Base ./array.jl:385
[7] ReadStatTable(table::DataFrame, ext::String; copycols::Bool, refpoolaslabel::Bool, vallabels::Dict{…}, hasmissing::Vector{…}, meta::ReadStatMeta, colmeta::StructArrays.StructVector{…}, varformat::Nothing, styles::Dict{…}, maxdispwidth::Int64, kwargs::@Kwargs{})
@ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:0
[8] ReadStatTable(table::DataFrame, ext::String)
@ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:88
[9] writestat(filepath::String, table::DataFrame; ext::String, kwargs::@Kwargs{})
@ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:316
[10] writestat(filepath::String, table::DataFrame)
@ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:312
String columns with missing types are not allowed to be written out.
@RagavRajan Thank you for reporting this. The current version requires users to deal with any Missing before calling writestat. I am aware of this issue, which is inconvenient and unintuitive for users.
I already made some changes to allow automatically dealling with Missing locally but have not made a PR yet. I expect to address this issue in the next patch release, which would probably happen this month.
That's good to know @junyuan-chen. I am looking forward to it.