S4Vectors icon indicating copy to clipboard operation
S4Vectors copied to clipboard

`[` fails when the condition contains `NAs`

Open gogonzo opened this issue 3 years ago • 5 comments

Hi there, I have an issue related with the [ which is not consistent with the base R. Normally, when one pass the condition containing NAs into [, NAs will be returned for this row.

df <- data.frame(a = c(1, NA, 3), b = 1:3)
df[df$a > 0, ]

#     a  b
# 1   1  1
# NA NA NA
# 3   3  3

With S4Vectors is not the case

DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3)
DF[DF$a > 0, ]
# Error: logical subscript contains NAs

Do you have a plan to change this?

Regards, DK

gogonzo avatar Nov 05 '21 10:11 gogonzo

Hi,

Right. It seems that row selection of DataFrame objects only supports NAs in numeric or character subscripts at the moment:

DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3, row.names=LETTERS[1:3])
DF[c(3, NA), ]
# DataFrame with 2 rows and 2 columns
#              a         b
#      <numeric> <integer>
# C            3         3
# <NA>        NA        NA

DF[c("C", NA), ]
# DataFrame with 2 rows and 2 columns
#              a         b
#      <numeric> <integer>
# C            3         3
# <NA>        NA        NA

but not in logical subscripts:

DF[c(FALSE, NA, TRUE), ]
# Error: logical subscript contains NAs

It looks like when we added support for NA subscripts a few years ago (see commit 85c3a56b2e5c69547481c83b31f32691fce93b2e), the logical case was overlooked. We'll work on this.

In the mean time, an easy workaround is to pass the logical subscript thru which():

DF[which(DF$a > 0), ]
# DataFrame with 2 rows and 2 columns
#           a         b
#   <numeric> <integer>
# A         1         1
# C         3         3

Note that this drops the rows corresponding to NAs in the logical subscript so does not behave exactly like df[df$a > 0, ], which you could see as a good thing. What are all these rows filled with NAs good for anyways?

If you really want to mimic exactly what df[df$a > 0, ] does:

DF[seq_len(nrow(DF))[DF$a > 0], ]
# DataFrame with 3 rows and 2 columns
#              a         b
#      <numeric> <integer>
# A            1         1
# <NA>        NA        NA
# C            3         3

Ouch... ugly! Hopefully this is still somewhat helpful?

Best, H.

hpages avatar Nov 05 '21 17:11 hpages

@hpages Thanks, I'm fine with this so far, I'll implement (temporary I hope) workaround on my side.

Regards, DK

gogonzo avatar Nov 10 '21 13:11 gogonzo

Hi guys, just to highlight the same problem but with NaN

DF <- S4Vectors::DataFrame(a = c(1, NaN))
DF[DF$a == 1, ]
# Error: logical subscript contains NAs

df <- data.frame(a = c(1, NaN))
df[df$a == 1, ]
# [1]  1 NA

Thanks for attention, Regards, DK

gogonzo avatar Nov 11 '21 13:11 gogonzo

@hpages @LiNk-NY thx a lot for your work on improving this - and pls let us know if we can help in any way (besides raising issues, thx to @gogonzo for that!)

danielinteractive avatar Nov 12 '21 12:11 danielinteractive

@hpages any updates on this? Thanks!

LiNk-NY avatar Jul 17 '23 13:07 LiNk-NY