assertr icon indicating copy to clipboard operation
assertr copied to clipboard

assert has error for single-row data frames and multiple columns

Open jayqi opened this issue 4 years ago • 2 comments

assert errors when making an assertion on multiple columns for a single-row data frame.

My guess is that that the issue involves the internals somewhere incorrectly treating the row as a vector, rather than as multiple length-one vectors. assert works fine with one-row-one-column, or multiple-rows-multiple-columns

Reproducible example, asserting on two columns with a single-row data frame:

library(assertr)
library(magrittr)
head(mtcars, 1) %>% assert(not_na, am, vs)
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent
packageVersion("assertr")
#> [1] '2.7'

Note that this error does not occur when asserting about only a single column for a single-row data frame:

library(assertr)
library(magrittr)
head(mtcars, 1) %>% assert(not_na, am)
#>           mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

or asserting about multiple columns with multiple-row data frames:

library(assertr)
library(magrittr)
head(mtcars, 2) %>% assert(not_na, am, vs)
#>               mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

Created on 2020-03-05 by the reprex package (v0.3.0)

jayqi avatar Mar 05 '20 19:03 jayqi

I have this same issue. Is there any resolution or workarounds for this at present?

mlamias avatar May 31 '22 17:05 mlamias

Sorry for the delay. I can't replicate this bug on version 2.8. Can you try that? Should be on CRAN

tonyfischetti avatar Jul 10 '22 15:07 tonyfischetti

Hi, I am experiencing a similar issue as @jayqi mentioned in the beginning of this thread. I am unable to replicate the same behaviour in version 2.8 as well, but I have another one. In my case, the assert statement is not executed properly for a single-row data frame, when it actually contains an occurrence of the specified condition that it is checking.

packageVersion("assertr")
#> [1] ‘2.8’

Here is a reproducible example for 4 different data frames:

library(assertr)
library(magrittr)
data("mtcars")

Defining 4 different data frames:

data_multirow <- mtcars
data_multirow_with_NA <- data_multirow
data_multirow_with_NA[1,1] <- NA
data_single_row <- head(data_multirow, n = 1)
data_single_row_with_NA <- head(data_multirow_with_NA, n = 1)

Checking the mpg column across 4 data frames:

  • multi-row data frame without any NAs
data_multirow %>% assert(not_na, mpg)
# prints the data frame
  • multi-row data frame with NA
data_multirow_with_NA %>% assert(not_na, mpg)
#> Column 'mpg' violates assertion 'not_na' 1 time
#> verb redux_fn predicate column index value
#> 1 assert       NA    not_na    mpg     1    NA
#> 
#> Error: assertr stopped execution
  • single-row data frame without any NAs
data_single_row %>% assert(not_na, mpg)
# prints the data frame
  • single-row data frame with NA
data_single_row_with_NA %>% assert(not_na, mpg)
#> Error: assertr stopped execution

As you can see, the last data frame example lacks the proper output.

After some investigation, it looks like the issue is in using the sapply function when creating log.mat inside the assert function definition:

log.mat <- sapply(colnames(sub.frame), function(column) {
    this.vector <- sub.frame[[column]]
    return(apply.predicate.to.vector(this.vector, predicate))
  })

When we have a multi-row data frame, this assignment returns a data.frame type, however, when the input is one row only, a vector type is returned. Therefore, later in the part where errors are assigned (errors <- lapply(...), the statement colnames(log.mat) is NULL, as there are no columns. I believe that because of that, for the single-row data frames which actually will produce errors in the assert statement, there are no errors produced, even though they should be.

@tonyfischetti could you please have a look at this? I will be grateful for a comment regarding this issue :)

mariadrywien avatar Oct 20 '22 15:10 mariadrywien

@mariadrywien I'm working on this now. Should bew done by the end of the day

tonyfischetti avatar Nov 05 '22 20:11 tonyfischetti

@mariadrywien Just fixed it. It's in master now. I'm submitting to CRAN now

tonyfischetti avatar Nov 05 '22 22:11 tonyfischetti

@tonyfischetti Thanks a lot! Works fine now 🙂

mariadrywien avatar Nov 10 '22 07:11 mariadrywien

Hi @tonyfischetti Unfortunately, the issue is still present in the case when you use assert with multiple columns. Please check this example (based on the previous one):

library(assertr)
library(magrittr)
data("mtcars")

data_multirow_with_NA <- mtcars
data_multirow_with_NA[1,1] <- NA
data_single_row_with_NA <- head(data_multirow_with_NA, n = 1)

# single-row data frame with NA
data_single_row_with_NA %>% assert(not_na, mpg, cyl)

#>Error: assertr stopped execution

jakubnowicki avatar Apr 21 '23 08:04 jakubnowicki