S4Vectors
S4Vectors copied to clipboard
`[` fails when the condition contains `NAs`
Hi there,
I have an issue related with the [ which is not consistent with the base R. Normally, when one pass the condition containing NAs into [, NAs will be returned for this row.
df <- data.frame(a = c(1, NA, 3), b = 1:3)
df[df$a > 0, ]
# a b
# 1 1 1
# NA NA NA
# 3 3 3
With S4Vectors is not the case
DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3)
DF[DF$a > 0, ]
# Error: logical subscript contains NAs
Do you have a plan to change this?
Regards, DK
Hi,
Right. It seems that row selection of DataFrame objects only supports NAs in numeric or character subscripts at the moment:
DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3, row.names=LETTERS[1:3])
DF[c(3, NA), ]
# DataFrame with 2 rows and 2 columns
# a b
# <numeric> <integer>
# C 3 3
# <NA> NA NA
DF[c("C", NA), ]
# DataFrame with 2 rows and 2 columns
# a b
# <numeric> <integer>
# C 3 3
# <NA> NA NA
but not in logical subscripts:
DF[c(FALSE, NA, TRUE), ]
# Error: logical subscript contains NAs
It looks like when we added support for NA subscripts a few years ago (see commit 85c3a56b2e5c69547481c83b31f32691fce93b2e), the logical case was overlooked. We'll work on this.
In the mean time, an easy workaround is to pass the logical subscript thru which():
DF[which(DF$a > 0), ]
# DataFrame with 2 rows and 2 columns
# a b
# <numeric> <integer>
# A 1 1
# C 3 3
Note that this drops the rows corresponding to NAs in the logical subscript so does not behave exactly like df[df$a > 0, ], which you could see as a good thing. What are all these rows filled with NAs good for anyways?
If you really want to mimic exactly what df[df$a > 0, ] does:
DF[seq_len(nrow(DF))[DF$a > 0], ]
# DataFrame with 3 rows and 2 columns
# a b
# <numeric> <integer>
# A 1 1
# <NA> NA NA
# C 3 3
Ouch... ugly! Hopefully this is still somewhat helpful?
Best, H.
@hpages Thanks, I'm fine with this so far, I'll implement (temporary I hope) workaround on my side.
Regards, DK
Hi guys,
just to highlight the same problem but with NaN
DF <- S4Vectors::DataFrame(a = c(1, NaN))
DF[DF$a == 1, ]
# Error: logical subscript contains NAs
df <- data.frame(a = c(1, NaN))
df[df$a == 1, ]
# [1] 1 NA
Thanks for attention, Regards, DK
@hpages @LiNk-NY thx a lot for your work on improving this - and pls let us know if we can help in any way (besides raising issues, thx to @gogonzo for that!)
@hpages any updates on this? Thanks!