UNF icon indicating copy to clipboard operation
UNF copied to clipboard

Inconsistent UNF values

Open dhicks opened this issue 4 years ago • 2 comments

This morning I'm working with some data that hasn't been touched since November (over 7 months ago). I'm the maintainer for this data, it lives on my personal machine, and I use UNF to validate which version of the dataset I'm working with. Today I'm getting UNF values that are inconsistent with values calculated last November. I'm getting similar inconsistencies for some of the examples in ?unf (shown below). In particular I'm getting inconsistencies for unf(longley, ver=4, digits=3) and unf(cbind.data.frame(x1,x2),ver=3) and its equivalents. The UNFs for my data were calculated using version 6.

Both calculations were done using UNF version 2.0.6 on the same machine. One potential difference is last November I was using R 3.5.1 and today I'm using R 4.0.0.

Please specify whether your issue is about:

  • [X] a possible bug
  • [ ] a question about package functionality
  • [ ] a suggested code or documentation change, improvement to the code, or feature request

Put your code here:


# Version 6 #


# Version 5 #
## vectors

### just numerics
unf5(1:20) # UNF:5:/FIOZM/29oC3TK/IE52m2A==
#> UNF5:/FIOZM/29oC3TK/IE52m2A==
unf5(-3:3, dvn_zero = TRUE) # UNF:5:pwzm1tdPaqypPWRWDeW6Jw==
#> UNF5:pwzm1tdPaqypPWRWDeW6Jw==

### characters and factors
unf5(c('test','1','2','3')) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ==
#> UNF5:fH4NJMYkaAJ16OWMEE+zpQ==
unf5(as.factor(c('test','1','2','3'))) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ==
#> UNF5:fH4NJMYkaAJ16OWMEE+zpQ==

### logicals
unf5(c(TRUE,TRUE,FALSE), dvn_zero=TRUE)# UNF:5:DedhGlU7W6o2CBelrIZ3iw==
#> UNF5:DedhGlU7W6o2CBelrIZ3iw==

### missing values
unf5(c(1:5,NA)) # UNF:5:Msnz4m7QVvqBUWxxrE7kNQ==
#> UNF5:Msnz4m7QVvqBUWxxrE7kNQ==

## variable order and object structure is irrelevant
unf(data.frame(1:3,4:6,7:9)) # UNF:5:ukDZSJXck7fn4SlPJMPFTQ==
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==

# Version 4 #
# version 4
unf(longley, ver=4, digits=3) # PjAV6/R6Kdg0urKrDVDzfMPWJrsBn5FfOdZVr9W8Ybg=
#> UNF4:3,128:KjRoxvNqv+Gkbso2DZ5N3lztfFYA02PPy8KlAByze9s=

# version 4.1
unf(longley, ver=4.1, digits=3) # 8nzEDWbNacXlv5Zypp+3YCQgMao/eNusOv/u5GmBj9I=
#> UNF4.1:3,128:8nzEDWbNacXlv5Zypp+3YCQgMao/eNusOv/u5GmBj9I=

# Version 3 #
x1 <- 1:20
x2 <- x1 + .00001

unf3(x1) # HRSmPi9QZzlIA+KwmDNP8w==
#> UNF3:M+FD+2bN2GJGqHJmhZeWig==
unf3(x2) # OhFpUw1lrpTE+csF30Ut4Q==
#> UNF3:cN+0PxPJHvbQQd5I+pLKpg==

# UNFs are identical at specified level of rounding
identical(unf3(x1), unf3(x2))
#> [1] FALSE
identical(unf3(x1, digits=5),unf3(x2, digits=5))
#> [1] TRUE

# dataframes, matrices, and lists are all treated identically:
unf(cbind.data.frame(x1,x2),ver=3) # E8+DS5SG4CSoM7j8KAkC9A==
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==
unf(list(x1,x2), ver=3)
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==
unf(cbind(x1,x2), ver=3)
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==

#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.5
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#> [1] UNF_2.0.6
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
#>  [5] base64enc_0.1-3 yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6  
#>  [9] rmarkdown_2.1   highr_0.8       knitr_1.28      stringr_1.4.0  
#> [13] xfun_0.13       digest_0.6.25   rlang_0.4.6     evaluate_0.14

Created on 2020-06-27 by the reprex package (v0.3.0)

dhicks avatar Jun 27 '20 16:06 dhicks