document the use of `rsubset` to preserve missings
Trying to subset b==0 while still preserving the missing in b
julia> dd = DataFrame(a = [1, 2, 3], b = [1, missing, 0])
3×2 DataFrame
Row │ a b
│ Int64 Int64?
─────┼────────────────
1 │ 1 1
2 │ 2 missing
3 │ 3 0
julia> @chain dd begin
@rsubset :b == 0 | ismissing(:b)
end
1×2 DataFrame
Row │ a b
│ Int64 Int64?
─────┼───────────────
1 │ 3 0
However,
learned on slack that :b == 0 returns missing and the | propagates missing. What we instead should do is
julia> @chain dd begin
@rsubset ismissing(:b) || :b == 0
end
Here the order matters. We would get an error otherwise.
This is a very subtle but important manipulation that it is better to be documented.
or use coalesce(:b == 0, true) which is exactly why coalesce exists.
I think we should just have a doc section for how to handle missings. @vjd if you have any students who would like to write one, that would be awesome! I can also get to it eventually.
yes, we will add a section for handling missings