visdat icon indicating copy to clipboard operation
visdat copied to clipboard

Enabling visdat to ignore minor differences

Open BobMuenchen opened this issue 5 years ago • 1 comments

It would be helpful if vis_compare could have an argument to tell it to ignore minor differences such as a shifted column location, storage type, or a different sort order. The functions tidyverse::setequal or compare::compare can detect those changes and tell you the data frames are otherwise the same:

Ways to compare data frames

names(mtcars) mtcars2 <- mtcars

Change cyl to character vector

mtcars2$cyl <- as.character(mtcars2$cyl)

Visualize the column differences (This will show any differences)

vis_compare(mtcars, mtcars2)

Change variable order

mtcars2 <- select(mtcars2, wt:carb, mpg:drat)

vis_compare doesn't know how to ignore this type of difference

vis_compare(mtcars, mtcars2)

Change row order by sorting

library("tidyverse") mydata100b <- arrange(mtcars2, mpg)

#Three ways to compare

identical(mtcars, mtcars2)

all.equal(mtcars, mtcars2)

setequal figures out what happened, but doesn't report different sort order.

library("tidyverse") setequal(mtcars, mtcars2)

The compare package reports even the sort difference

install.packages("compare") library("compare")

This tests one column at a time

compare(mtcars, mtcars2)

This figures out all changes:

compare(mtcars, mtcars2, allowAll = TRUE)

visdat can't ignore those relatively minor changes

library("visdat") vis_compare(mtcars, mtcars2)

BobMuenchen avatar Sep 03 '18 12:09 BobMuenchen

Thank you for taking the time to write this!

The compare package looks like a great approach to this, hopefully this can make it into the next release of visdat.

njtierney avatar Sep 19 '18 03:09 njtierney