Allow either DataValues or Missings for representing missing data
@davidanthoff I wanted to run this by you before preparing a PR.
I think it should be possible to represent missing data values in the result using either the DataValues package or the Missings package. Suppose that all the read_* functions had an optional Bool argument to control this choice with its value being stored in the ReadStatDataFrame. The handle_variable! function then ends with
push!(ds.data, ds.useMissings ? fill(Union{jtype, Missing}(missing), ds.rows) : DataValueVector{ds.rows))
I think the only additional changes would be to use
dest::Union{DataValueVector{T}, Vector{Union{Missing,T}}}
in the signature of the readfield! methods and to add the Missings package.
Shall I prepare a PR for your consideration?
What is the use-case for this? https://github.com/davidanthoff/StatFiles.jl should support getting a DataFrame that uses Missing, and my idea here was that ReadStat.jl really is just a low-level library that most folks don't really interact with.
I believe that as of version 0.7 of Julia, the Missing type, the missing value and the ismissing function will be part of Base.
Yes, that is so. Should we revisit once 0.7 is out (or at least a beta or something like that)? There are some really complicated integration questions on how all of this interacts with Query.jl that just will require more time.
But I do hope that in the meantime the approach with StatFiles.jl should work for folks that want a DataFrame that has Vector{Union{T,Missing}} columns, that should all just work already.
I agree that waiting until 0.7 is at least in beta is a good idea.
I don't think 0.7 will change a lot of things (except performance) here. You could as well use Missings.jl for now, it should provide the same features on 0.6.
I have a version using Missing in the julia7 branch in https://github.com/dmbates/ReadStat.jl
I haven't created a pull request yet. I wanted to check first with @davidanthoff to see if he would be open to this approach. This version works on Julia 0.6.2 and Julia 0.7.0-dev.
Tests don't work because this returns Vector{Union{Missing,T}} in the data field. I can add/change tests if this approach is acceptable.