ReadStatTables.jl icon indicating copy to clipboard operation
ReadStatTables.jl copied to clipboard

Writing `DataFrame` with missing value in `String` columns

Open RagavRajan opened this issue 8 months ago • 2 comments

Consider the haven R package. The behavior of handling character columns with NA written out is, NA is converted into empty strings ""

library(haven)

df_write <- data.frame(USUBJID = c("01-701-1028", NA, NA), EVID = c(NA, 0, 1))
df_write 
#>       USUBJID EVID
#> 1 01-701-1028   NA
#> 2        <NA>    0
#> 3        <NA>    1

write_xpt(df_write, "df.xpt")

df_read <- read_xpt("df.xpt")
df_read
#> # A tibble: 3 × 2
#>   USUBJID        EVID
#>   <chr>         <dbl>
#> 1 "01-701-1028"    NA
#> 2 ""                0
#> 3 ""                1

Created on 2025-03-07 with reprex v2.1.1

But in ReadStatTables

df_write = DataFrame(USUBJID=["01-701-1028", missing, missing], EVID=[missing, 0, 1])
df_write
#> 3×2 DataFrame
#>  Row │ USUBJID      EVID    
#>      │ String?      Int64?  
#> ─────┼──────────────────────
#>    1 │ 01-701-1028  missing 
#>    2 │ missing            0
#>    3 │ missing            1

writestat("df.xpt", df_write)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type String

Closest candidates are:
  convert(::Type{String}, ::StringManipulation.Decoration)
   @ StringManipulation ~/.julia/packages/StringManipulation/5Zfrz/src/decorations.jl:315
  convert(::Type{String}, ::Base.JuliaSyntax.Kind)
   @ Base /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-10/base/JuliaSyntax/src/kinds.jl:975
  convert(::Type{String}, ::String)
   @ Base essentials.jl:321
  ...

Stacktrace:
  [1] setindex!(A::Vector{String}, x::Missing, i1::Int64)
    @ Base ./array.jl:1021
  [2] _unsafe_copyto!(dest::Vector{String}, doffs::Int64, src::Vector{Union{Missing, String}}, soffs::Int64, n::Int64)
    @ Base ./array.jl:299
  [3] unsafe_copyto!
    @ ./array.jl:353 [inlined]
  [4] _copyto_impl!
    @ ./array.jl:376 [inlined]
  [5] copyto!
    @ ./array.jl:363 [inlined]
  [6] copyto!(dest::Vector{String}, src::Vector{Union{Missing, String}})
    @ Base ./array.jl:385
  [7] ReadStatTable(table::DataFrame, ext::String; copycols::Bool, refpoolaslabel::Bool, vallabels::Dict{…}, hasmissing::Vector{…}, meta::ReadStatMeta, colmeta::StructArrays.StructVector{…}, varformat::Nothing, styles::Dict{…}, maxdispwidth::Int64, kwargs::@Kwargs{})
    @ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:0
  [8] ReadStatTable(table::DataFrame, ext::String)
    @ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:88
  [9] writestat(filepath::String, table::DataFrame; ext::String, kwargs::@Kwargs{})
    @ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:316
 [10] writestat(filepath::String, table::DataFrame)
    @ ReadStatTables ~/.julia/packages/ReadStatTables/YJxyU/src/writestat.jl:312

String columns with missing types are not allowed to be written out.

RagavRajan avatar Mar 07 '25 06:03 RagavRajan

@RagavRajan Thank you for reporting this. The current version requires users to deal with any Missing before calling writestat. I am aware of this issue, which is inconvenient and unintuitive for users.

I already made some changes to allow automatically dealling with Missing locally but have not made a PR yet. I expect to address this issue in the next patch release, which would probably happen this month.

junyuan-chen avatar Mar 07 '25 06:03 junyuan-chen

That's good to know @junyuan-chen. I am looking forward to it.

RagavRajan avatar Mar 07 '25 06:03 RagavRajan