DataFramesMeta.jl icon indicating copy to clipboard operation
DataFramesMeta.jl copied to clipboard

document the use of `rsubset` to preserve missings

Open vjd opened this issue 4 years ago • 3 comments

Trying to subset b==0 while still preserving the missing in b

julia> dd = DataFrame(a = [1, 2, 3], b = [1, missing, 0])
3×2 DataFrame
 Row │ a      b       
     │ Int64  Int64?  
─────┼────────────────
   1 │     1        1
   2 │     2  missing 
   3 │     3        0

julia> @chain dd begin
           @rsubset :b == 0 | ismissing(:b)
       end
1×2 DataFrame
 Row │ a      b      
     │ Int64  Int64? 
─────┼───────────────
   1 │     3       0

However,

learned on slack that :b == 0 returns missing and the | propagates missing. What we instead should do is

julia> @chain dd begin
           @rsubset ismissing(:b) || :b == 0
       end

Here the order matters. We would get an error otherwise.

This is a very subtle but important manipulation that it is better to be documented.

vjd avatar Nov 29 '21 21:11 vjd

or use coalesce(:b == 0, true) which is exactly why coalesce exists.

bkamins avatar Nov 29 '21 21:11 bkamins

I think we should just have a doc section for how to handle missings. @vjd if you have any students who would like to write one, that would be awesome! I can also get to it eventually.

pdeffebach avatar Nov 30 '21 15:11 pdeffebach

yes, we will add a section for handling missings

vjd avatar Dec 01 '21 08:12 vjd